[comp.unix.wizards] Need help using /usr/lib/cpp for generic text

verber@pacific.mps.ohio-state.edu (Mark A. Verber) (09/21/89)

I need some help using /usr/lib/cpp.  I am using cpp for the
conditionals and for it's macros.  I have a rather large document (a
Introductory Facilities Guide for Unix, Tops-20, Macintosh, and
shortly IBM-PC, and Vax/VMS).  Originally this guide was for a single
site, but I have been working hard to make it very generic.  The
intention is that any site could easily use this guide for their local
operation with minimal fuss: They would need to change some macros
(like name of the orginzation), set some flags as to what machines
they have, and edit a few files that are broken out of the rest of the
documents since we know each site will be different.

The alpha version of the document used TeX \defs and a simple ifdef
macro that I wrote.  It has become clear that this isn't enough so
I decided to move to using make and cpp.

I have run into two problems with cpp that I hope someone could help
me with. 

(1)  Bloody "# line-number file-name" lines

I had thought that the -P switch suppressed such output, but that doesn't
seem to be the case (SunOS 4.x /usr/lib/cpp).  I don't want these lines.

(2)  Leaving <cr> in the text

When I run text like:	I get the output like:		I would like:

	#define foo	before				before
	before						test
	#ifdef foo	test				after
	test
	#endif		after
	after

Any suggestions on getting cpp to eat the <cr> at the end of the control
lines?  Is there a PD cpp or other macro processor that will do the
job for me... or should I pull out ye old perl manual.

-- 
Mark A. Verber
System Programmer, Physics Department, Ohio State University
verber@pacific.mps.ohio-state.edu
(614) 292-8002

jik@Athena.MIT.EDU (Jonathan I. Kamens) (09/22/89)

In article <836@pacific.mps.ohio-state.edu>, verber@pacific.mps.ohio-state.edu
(Mark A. Verber) writes:
> [Various problems with using cpp for TeX text files.]

I use the m4 macro preprocessor when I need that functionality for text
files, and I find that it's more convenient for text files, at least
partially because you can do expansions inside quotes and it won't treat
the quoted strings specially.

I believe (but I'm not sure) that m4 is freely redistributable (but not
in the public domain, since I think it's Berkeley code).  If it is, you
should be able to ftp to uunet.uu.net and snarf it from the Berkeley
sources that are available there.

Make sure to get a man page too :-)

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-4261			      Home: 617-782-0710

maart@cs.vu.nl (Maarten Litmaath) (09/22/89)

verber@pacific.mps.ohio-state.edu (Mark A. Verber) writes:
\...
\(1)  Bloody "# line-number file-name" lines
\
\I had thought that the -P switch suppressed such output, but that doesn't
\seem to be the case (SunOS 4.x /usr/lib/cpp).  I don't want these lines.

Huh?  `-P' works for me on SunOS 4.0.1!

\(2)  Leaving <cr> in the text
\
\When I run text like:	I get the output like:		I would like:
\
\	#define foo	before				before
\	before						test
\	#ifdef foo	test				after
\	test
\	#endif		after
\	after

You could use the following script instead of cpp:
----------8<----------8<----------8<----------8<----------8<----------
#!/bin/sh

tab="	"

for i in define undef ifdef ifndef if elif else endif
do
	SED1="
		$SED1
		/^[ $tab]*#[ $tab]*$i/{
			s//\\\\&/
			p
			s/.//
			b
		}
	"
			# due to a bug in some sed versions, the `p' mustn't
			# be appended to the previous substitute command
	SED2="
		$SED2
		/^\\\\[ $tab]*#[ $tab]*$i/{
			N
			d
		}
	"
done

sed "$SED1" ${1+"$@"} | /lib/cpp -P | sed "$SED2"
----------8<----------8<----------8<----------8<----------8<----------

The idea is to remember that a line must be removed:

	#define foo bar

becomes

	\#define foo bar
	#define foo bar

cpp will leave the first line intact and change the second to an empty line.
The second invocation of sed will delete both lines.
One limitation: you shouldn't use `/*' and `*/' to comment out text; instead
use:
	#if 0
	...
	#endif 0
-- 
   creat(2) shouldn't have been create(2): |Maarten Litmaath @ VU Amsterdam:
      it shouldn't have existed at all.    |maart@cs.vu.nl, mcvax!botter!maart

ok@cs.mu.oz.au (Richard O'Keefe) (09/22/89)

In article <3323@solo10.cs.vu.nl>, maart@cs.vu.nl (Maarten Litmaath) writes:
> One limitation: you shouldn't use `/*' and `*/' to comment out text; instead
> use:
> 	#if 0
> 	...
> 	#endif 0

This advice was correct in the specific context (using cpp for generic text
on a particular operating system).  However, it is not correct for C.
        #endif 0
(a)            ^ this token is not legal in dpANS C; V.3 tends not to like it
(b) You should not use this technique to comment out text in a C program; new
C compilers are allowed to complain about mismatched quotes when they see
"don't" and other such text, and some _do_.

    You're going to laugh, but how about using the Bourne shell as a
condition processing facility?  E.g.
	if [ ... ] ; then
	    cat <<'EndOfPart'
	any old text -- almost
	EndOfPart
	else
	    # equivalent of 'include'
	    cat foo.inc
	fi

verber@pacific.mps.ohio-state.edu (Mark A. Verber) (09/22/89)

I would like to thank the >10 who have responded to my request.  All
but one person suggested for me to use m4.  This would normally be the
best solution (I thought about it too) except that m4 wants
conditionals to be enclosed in quoting characters There is no pair of
quoting characters that aren't used in my document.  Square brackets
came the closest to being usable, but the chapter on VMS uses a lot of
[].  I wanted to ifdef large sections of text... multiple paragraphs.
Rrying to make sure that I have none of the quoting characters in
the text itself was just too risky. 

The solutions I am using right now was suggested by maart@cs.vu.nl.
I use sed to tag all the cpp controls, run the document through
cpp, and run the doc through sed again looking for the tags and nuking
tags and extra <cr>.
-- 
Mark A. Verber
System Programmer, Physics Department, Ohio State University
verber@pacific.mps.ohio-state.edu
(614) 292-8002

maart@cs.vu.nl (Maarten Litmaath) (09/23/89)

ok@cs.mu.oz.au (Richard O'Keefe) writes:
\...     #if 0
\...     ...
\        #endif 0
\(a)            ^ this token is not legal in dpANS C; V.3 tends not to like it

What!?  The rest of the line isn't ignored?  And what was the very good reason
the ANSI committee decided so?

\(b) You should not use this technique to comment out text in a C program; new
\C compilers are allowed to complain about mismatched quotes when they see
\"don't" and other such text, and some _do_.

I remember the discussion, but I don't recall the very good reason (here we
go again) why those compilers could complain.  Quality of implementation?
That's another point for Sun, 'cause their cpp works. :-)

\    You're going to laugh, but how about using the Bourne shell as a
\condition processing facility?  E.g.
\	if [ ... ] ; then
\	    cat <<'EndOfPart'
\	any old text -- almost
\	EndOfPart
\	else
\	    # equivalent of 'include'
\	    cat foo.inc
\	fi

Great!  However, use `test' instead of `[' if portability is an issue. :-(
-- 
   creat(2) shouldn't have been create(2): |Maarten Litmaath @ VU Amsterdam:
      it shouldn't have existed at all.    |maart@cs.vu.nl, mcvax!botter!maart

guy@auspex.auspex.com (Guy Harris) (09/24/89)

>ok@cs.mu.oz.au (Richard O'Keefe) writes:
>\...     #if 0
>\...     ...
>\        #endif 0
>\(a)            ^ this token is not legal in dpANS C; V.3 tends not to like it
>
>What!?  The rest of the line isn't ignored?  And what was the very good reason
>the ANSI committee decided so?

From the Rationale (presented without comment - I neither endorse nor
reject their rationale, I merely present it):

	   Various proposals were considered for permitting text other
	than comments at the end of directives, particularly "#endif"
	and "#else", presumably to label them for easier matchup with
	their corresponding "#if" directives.  The Committee rejected
	all such proposals because of the difficulty of specifying
	exactly what would be permitted, and how the translator would
	have to process it.

Three notes:

	1) V.3 "tends not to like it", but merely spits out a warning -
	   the code will still compile.

	2) If you put "/*" and "*/" around the token, the resulting text
	   is legal (assuming, of course, that there's no "*/" in the
	   token, etc.).

	3) If there already exist compilers that disallow extra tokens
	   like that, one should consider replacing them with comments
	   *anyway*.

>\(b) You should not use this technique to comment out text in a C program; new
>\C compilers are allowed to complain about mismatched quotes when they see
>\"don't" and other such text, and some _do_.
>
>I remember the discussion, but I don't recall the very good reason (here we
>go again) why those compilers could complain.  Quality of implementation?
>That's another point for Sun, 'cause their cpp works. :-)

Sun's "cpp" is derived from various versions of the AT&T preprocessor
(the precise version depends on the SunOS release); the credit doesn't
go to Sun, by and large, for it working in that particular case - you'll
probably find most, if not all, other Reiser-based "cpp"s (well, "cpp"s
based on Reiser's "cpp", not based on Reiser himself, but you knew what
I meant :-)), handle that particular case in the fashion you desire. 

The SunOS 4.x "cpp" is, BTW, based on the S5R3 "cpp", but the warning for
extra tokens at the end of "#else"/"#endif" was put under the control of
a "-p" (for "portability") flag to "cpp" (used, as I remember, only by
the S5-environment "lint" when it is given the "-p" flag) because the
practice was so widespread. 

henry@utzoo.uucp (Henry Spencer) (09/24/89)

In article <3346@solo12.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>\        #endif 0
>\(a)            ^ this token is not legal in dpANS C; V.3 tends not to like it
>
>What!?  The rest of the line isn't ignored?  And what was the very good reason
>the ANSI committee decided so?

Because only a few compilers ignored it, and the rest didn't, and there didn't
seem to be any good reason to perpetuate this irregular accident of certain
implementations.  Try `#endif /* 0 */' if you want to put a comment in.

>\(b) You should not use this technique to comment out text in a C program; new
>\C compilers are allowed to complain about mismatched quotes when they see
>\"don't" and other such text, and some _do_.
>
>I remember the discussion, but I don't recall the very good reason (here we
>go again) why those compilers could complain...

A good many compilers tokenize text before doing anything else with it, even
preprocessing.  To avoid rendering all those implementations illegal, it has
to be possible to do this.  The preprocessor was never specified clearly
enough to say that these implementations are wrong.
-- 
"Where is D.D. Harriman now,   |     Henry Spencer at U of Toronto Zoology
when we really *need* him?"    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

guy@auspex.auspex.com (Guy Harris) (09/24/89)

>I believe (but I'm not sure) that m4 is freely redistributable (but not
>in the public domain, since I think it's Berkeley code).

Well, the M4 that comes with 4.3-tahoe is AT&T code.  There is, however,
a publicly-available M4 in the "comp.sources.unix" archive, in volume 13.

I second your recommendation that people use M4 rather than "cpp" for
this purpose, unless they decide to use the publicly-available "cpp" that
comes with X11; the "cpp" that comes with UNIX systems isn't intended to
be used as a general-purpose macro processor, and many aspects of its
behavior are subject to change, as people have already discovered.

merlyn@iwarp.intel.com (Randal Schwartz) (09/26/89)

In article <838@pacific.mps.ohio-state.edu>, verber@pacific (Mark A. Verber) writes:
[wanting to use cpp for macros and conditionals, and has problems with:]
| (1)  Bloody "# line-number file-name" lines
[example deleted]
| (2)  Leaving <cr> in the text
| 
| When I run text like:	I get the output like:		I would like:
| 
| 	#define foo	before				before
| 	before						test
| 	#ifdef foo	test				after
| 	test
| 	#endif		after
| 	after
| 
| Any suggestions on getting cpp to eat the <cr> at the end of the control
| lines?  Is there a PD cpp or other macro processor that will do the
| job for me... or should I pull out ye old perl manual.

My first inclination would be "yeah, write what you want in Perl",
just because it'd help the sale of my forthcoming book (:-), but
really, cpp is NOT what you want.  Both of your problems are because
cpp output is expected to be fed into the C compiler, and the compiler
wants to keep track of where the source lines were coming from
*before* they were processed.

If you don't want to do it from scratch, how about m4(1)?  You can get
pretty much EXACT control over your input-to-output translation, and
except for the handful of well-defined commands, *nothing* else is
defined.  (Did'ja Ever try to have a variable named "vax" on a vax?  I
did.  Sigh. :-)

Just another Perl hacker,
-- 
/== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\
| on contract to Intel, Hillsboro, Oregon, USA                           |
| merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn	         |
\== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/

jik@athena.mit.edu (Jonathan I. Kamens) (09/26/89)

In article <840@pacific.mps.ohio-state.edu> verber@pacific.mps.ohio-state.edu
(Mark A. Verber) writes:
>I would like to thank the >10 who have responded to my request.  All
>but one person suggested for me to use m4.  This would normally be the
>best solution (I thought about it too) except that m4 wants
>conditionals to be enclosed in quoting characters There is no pair of
>quoting characters that aren't used in my document.  Square brackets
>came the closest to being usable, but the chapter on VMS uses a lot of
>[].  I wanted to ifdef large sections of text... multiple paragraphs.
>Rrying to make sure that I have none of the quoting characters in
>the text itself was just too risky. 

  You can use *anything* as the two quoting characters in an
m4-processed file.  I use ^ and @ in one file.  Heck, I just tried it,
and it appears that even control characters can be used as the quoting
characters.  Surely there are two characters in the ASCII character
set that you don't use in your file? :-)

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-4261			      Home: 617-782-0710