[comp.text.tex] TeX macros to format C without outside assistance

emcmanus@cs.tcd.ie (Eamonn McManus) (03/06/90)

In <1704@tws8.cs.tcd.ie>, I posted macros that allow TeX to input a C file
directly and format it in a fairly reasonable way.  Since posting that,
I've fixed some bugs and made some changes, so that I'm now posting the
modified file.  LaTeX users should install this as cprog.sty; TeX users as
cprog.tex.  The comments in the file itself explain how to use it, and what
parameters can be changed by the user.  As before, I would appreciate
comments and suggestions about this package.

In subsequent postings with the same subject, I will be posting a test file
for these macros and a discussion of possible but difficult improvements
that could be made.


% cprog.tex (or cprog.sty) - formatting of C programs
% By Eamonn McManus <emcmanus@cs.tcd.ie>.  This file is not copyrighted.
% $Header: cprog.tex,v 1.1 90/03/05 20:02:39 emcmanus Exp $

% This allows C programs to be formatted directly by TeX.  It can be
% invoked by \cprogfile{filename} or (in LaTeX) \begin{cprog} ...
% \end{cprog} or (in plain TeX) \cprog ... \end{cprog}.

% The formatting is (necessarily) simple.  C text is set in a normal Roman
% font, comments in a slanted font, and strings in a typewriter font, with
% spaces made visible as the `square u' symbol.  Tabs are expanded to four
% spaces (this does not look good when comments are aligned to the right of
% program text).  Some pairs of input characters appear as single output
% characters: << <= >> >= != -> are respectively TeX's \ll \le \gg \ge \ne
% \rightarrow.

% The fonts below can be changed to alter the setting of the various parts
% of the program.  The \cprogbaselineskip parameter can be altered to
% change the line spacing.  LaTeX's \baselinestretch is taken into account
% too.  The indentation applied to the whole program is \cprogindent,
% initially 0.  Before and after the program there are skips of
% \beforecprogskip and \aftercprogskip; the default values are \parskip
% and 0 respectively (since there will often be a \parskip after the
% program anyway).

% This package works by making a large number of characters active.  Since
% even spaces are active, it is possible to examine the next character in
% a macro by making it a parameter, rather than using \futurelet as one
% would normally do.  This is more convenient, but the coding does mean
% that if the next character itself wants to examine a character it may
% look at a token from the macro rather than the input text.  I think that
% all cases that occur in practice have been looked after.

% The macros were thrown together rather quickly, and could do with some
% work.  For example, the big macro defined with @[] taking the place of
% \{} could be recoded to use \{} and so be more legible.  The grouping of
% two-character pairs should be controllable, since not everyone will want
% it.  The internal macros etc should have @ in their names, and should be
% checked against LaTeX macros for clashes.

% Allow multiple inclusion to go faster.
\ifx\undefined\cprogsetup	% The whole file.

% Define the fonts used for program text, comments, and strings.
% Note that if \it is used for \ccommentfont, something will need to
% be done about $ signs, which come out as pounds sterling.
\let\ctextfont=\rm \let\ccommentfont=\sl \let\cstringfont=\tt

% Parameters.  Unfortunately \newdimen is \outer (\outerness is a mistake)
% so we need a subterfuge in case we are skipping the file.
\csname newdimen\endcsname\cprogbaselineskip \cprogbaselineskip=\baselineskip
\csname newdimen\endcsname\cprogindent \cprogindent=0pt
\csname newskip\endcsname\beforecprogskip \beforecprogskip=\parskip
\csname newskip\endcsname\aftercprogskip \aftercprogskip=0pt

\def\makeactive#1{\catcode`#1=\active} \def\makeother#1{\catcode`#1=12}
{\obeyspaces\gdef\activespace{ } \obeylines\gdef\activecr{^^M}}
\def\spacewidthof{\fontdimen2}	% Width of a space in the following font.

% The following group makes many characters active, so that their catcodes
% in the \cprogchars macro are active, allowing them to be defined.  We
% could alternatively define more stuff like \activebackslash and use
% \expandafter or (carefully) \edef to expand these in the macro.
\begingroup
\catcode`@=\catcode`\\ \catcode`[=\catcode`{ \catcode`]=\catcode`}
\catcode9=\active
\makeactive! \makeactive" \makeactive' \makeactive* \makeactive- \makeactive/
\makeactive< \makeactive> \makeactive\{ \makeactive\} \makeactive|
\makeactive\\
@gdef@activebackslash[\]@gdef@activestar[*]
@gdef@cprogchars[% Don't indent this macro with tabs!  They are active.
    @makeother##@makeother$@makeother&@makeother@%@makeother^%
    @makeactive"@makeactive'@makeactive*@makeactive-@makeactive/%
    @makeactive<@makeactive>@makeactive{@makeactive}@makeactive|%
    @makeactive!@makeactive\@makeactive_%
    @def!##1[@ifx=##1$@ne$@else@string!##1@fi]%
    @def-##1[@ifx>##1$@rightarrow$@else$@string-$##1@fi]%
    @def"[@cquote"]@def'[@cquote']@def*[$@string*$]%
    % We use \aftergroup in < and > to deal with the fact that #1 might
    % itself examine the following character.
    @def<##1[[$@ifx<##1@ll$@else@ifx=##1@le$@else
      @string<$@aftergroup##1@fi@fi]]%
    @def>##1[[$@ifx>##1@gg$@else@ifx=##1@ge$@else
      @string>$@aftergroup##1@fi@fi]]%
    @def{[$@string{$]@def}[$@string}$]%
    @def|[$@string|$]@def\[$@backslash$]@def~[$@sim$]%
    @let/=@ccomment
    @obeyspaces @expandafter@def@activespace[@leavevmode@space]%
    @catcode9=@active @def^^I[@ @ @ @ ]%
    @obeylines @expandafter@def@activecr[@strut@par]]
    % This macro is illegible.
@gdef@cprogarg#1\end{cprog}[@eatcr#1@endcprogarg]
@endgroup

\begingroup \makeactive" \makeactive'
\gdef\cquote#1{% #1 is the quote, " or '.
    \begingroup \tt\string#1\cstringfont \makeactive\\%
    \expandafter\let\activebackslash\quotebackslash
    \expandafter\def\activespace{\char`\ }%
    \expandafter\let\activecr=\unclosedstring
    \makeother*\makeother-\makeother/\makeother<\makeother>\makeother\{%
    \makeother\}\makeother_\makeother|%
    \ifx"#1\def'{\char13}\else\makeother"\fi
    \def#1{\tt\string#1\endgroup}}
\endgroup
\def\unclosedstring{%
    \errhelp{A string or character constant earlier in the line was unclosed.^^J
So I'm closing it now.}%
    \errmessage{Unclosed string}%
    \endgroup}
\newlinechar=`^^J
\def\quotebackslash#1{\char`\\%
    \expandafter\ifx\activecr#1\strut\par
      \else\string#1\fi}

% In a comment, we shrink the width of the opening / to that of a space so
% that the stars in multiline comments will line up.  We also shrink the
% closing * for symmetry.
\def\spacebox#1{\hbox to \spacewidthof\font{#1\hss}}
\begingroup \makeactive*
\gdef\ccomment#1{%
    \ifx#1*\begingroup \leavevmode \ccommentfont
      % We want the width of a space in \ccommentfont, not \ctextfont.
      \spacebox{\ctextfont\string/}*%
      \makeother-\makeother'\makeother"\makeother/%
      \makeactive*\let*=\commentstar
    \else \leavevmode\string/#1\kern-1pt %
    \fi}
\makeother* \makeactive/
\gdef\commentstar#1{%
    \ifx #1/\endgroup \spacebox{$*$}\string/\let\next\relax%
    \else $*$\let\next#1%
    \fi\next}
\endgroup

% We usually have an active ^^M after \cprog or \begin{cprog}.
\def\eatcr#1{{\expandafter\ifx\activecr#1\else\aftergroup#1\fi}}

% Expand to stretch and shrink (plus and minus) of parameter #1.
\def\stretchshrink#1{\expandafter\eatdimenpart\the#1 \end}
\def\eatdimenpart#1 #2\end{#2}

\ifx\undefined\baselinestretch \def\baselinestretch{1}\fi

\def\cprogsetup{\cprogchars \ctextfont \parskip=0pt\stretchshrink\parskip
    \baselineskip=\baselinestretch\cprogbaselineskip \parindent=\cprogindent
    \vskip\beforecprogskip}
\def\endcprog{\endgroup \vskip\aftercprogskip}
\def\cprogfile#1{\begingroup \cprogsetup \input#1\endcprog}
% The {cprog} environment or \cprog macro reads in all the argument text.
% By making the C definition of \ much cleverer we could avoid this.
\def\cprog{\begingroup \cprogsetup \cprogarg}
% In LaTeX we need to call \end{cprog} properly to close the environment,
% whereas in plain TeX this will end the job.  The test for LaTeX is not
% bulletproof, but most plain TeX documents don't refer to the LaTeX logo.
\ifx\undefined\LaTeX \let\endcprogarg=\endcprog
\else \def\endcprogarg{\end{cprog}}
\fi

\fi	% \ifx\undefined\cprogsetup

\endinput
--
Eamonn McManus <emcmanus@cs.tcd.ie>	<emcmanus%cs.tcd.ie@cunyvm.cuny.edu>
	  One of the 0% of Americans who are not Americans.

emcmanus@cs.tcd.ie (Eamonn McManus) (03/06/90)

As promised, here is the test input I use for testing the C program macros.
I've tried to put instances of most tricky cases here, so that people
tweaking the macros can make sure they don't break anything.  There are
also enough examples that you can get an idea of what C programs look like
in general.  You can see from the three #defines that non-initial tabs
don't work very well.


% tcprog.tex - LaTeX test file for cprog.sty
% This isn't a valid C program, it just tests the cprog macros.
\documentstyle[cprog]{article}	% Plain TeX: \input cprog
\beforecprogskip=0.3in
\aftercprogskip=0.3in
\cprogindent=1in
\begin{document}		% Plain TeX: remove this line.
% Ensure it can be input multiple times; plain TeX: \input cprog
\input cprog.sty
% Paragraph so we can be sure \beforeprogskip works properly.
hello, world
% Start cprog environment.  Plain TeX: \cprog
\begin{cprog}
/* t.c - test file for cprog macros. */

/* This is a comment
 * extending over
 * several lines.
 */

	/* Indented comment,
	 * ditto.
	 */

/**** Yukky comment! ****/

"Funny string, with ', \\, \", etc."
"Funny and \
continued string."

#define ONE	1	/* Unit. */
#define TWO	2	/* Two units. */
#define THREE	3	/* Three units. */

if (!(x & ONE)) {
	foo(a % 5); foo('a'); foo('\'');
	foo(a * 5); foo(a + 5); foo2(a, b);
	foo((int)2.3); foo(a / 5); foo(a ? a : 5);
	if (a < b || a > c) range(a);
	if (a == b) a = c;
	while (p[a] != p[i]) a++;
#define macro (1 + \
		1)
	foo(something_else);
	foo(~a);
	if (a <= b || a >= b || a != b) {
		a <<= b;
		b = a << 5;
		b >>= a;
	}
}
\end{cprog}
% Another paragraph, for \aftercprogskip.
hello again
\end{document}	% Plain TeX: \bye
--
Eamonn McManus <emcmanus@cs.tcd.ie>	<emcmanus%cs.tcd.ie@cunyvm.cuny.edu>
	  One of the 0% of Americans who are not Americans.