[comp.text] automated index generation

dbf@myrias.com (David Ferrier) (08/02/89)

Some time ago I requested methods for automatically generating index 
entries, and indexes, for documents being formatted with troff. There was
a good response. Herewith are the responses "summarized"
as best I could (one person shipped quite a bit of code).

David Ferrier                            | computer: a million morons
Edmonton, Alberta                        | working at the speed of light
uunet!myrias.COM!dbf                     |

=========================================================================

                            Table of Contents

This summary of automated indexing methods contains
Usenet articles by the following individuals:

From: Beverly Erlebacher <erlebach@turing.toronto.edu>
From: jpr@dasys1.UUCP (Jean-Pierre Radley)
From: alberta!steve
From: alberta!steve
From: jbw@unix.cis.pittsburgh.edu (Jingbai  Wang)
From: bsa@telotech.UUCP (Brandon S. Allbery)
From: Jim Westervelt <uunet!uiucuxc!osiris.cso.uiuc.edu!westerve>
From: bsa@telotech.uucp (Brandon S. Allbery)

=========================================================================

From: Beverly Erlebacher <erlebach@turing.toronto.edu>
Subject: Re: automatic document indexing

this is a lot easier than you might think.  i learned how from the 
netnews article i am enclosing below.  the article describes a quite
elaborate and customized method - i stripped it down a bit for my use.

i can't tell from your posting how knowledgable you are about troff.
if you have trouble understanding or using this method and you can't
get help locally, you can write to me.

(summary:  the .tm command causes troff to emit its argument to stderr.
collect these into a file, sort them and format them into an index.)

yrs, b.

[included article follows]

=========================================================================
>From: jpr@dasys1.UUCP (Jean-Pierre Radley)
Subject: Re: Wanted: Documentation Indexing Tools
Date: 18 Oct 88 04:25:54 GMT
Organization: TANGENT
Lines: 113

In article <4597@brspyr1.BRS.Com> bob@brspyr1.BRS.Com (Bob Armao) writes:
>    We use nroff with our own customized macros to construct  and
>    format a permuted index...
>
>    I'm looking for an approach (public domain or otherwise) that
>    will  enable  us  to  produce  a more traditional-looking and
>    effective  index.   For  now  we  have  to  live  within  the
>    restrictions of nroff and family.


Here is how I handle indexing from nroff sources. The scheme is
originally described in Bourne's "The UNIX System".

I define these two macros and include them in a file .macros
(along with other macros pertinent to the project at hand):

	.deIX			\" index macro
	.tm \\$1\a..   \\n(H1-\\nP
	..
	.deCX			\" choose index item macro
	.IX "\\$1, \\$2"
	.IX "\\$2, \\$1"

The first line in each source file is:

	.if!\w@\*(tL@ .so .macros

This shell script (where <TAB> is shown here in place of the actual tab
character itself, i.e. hex 09) assembles all my chapters:

	: broff
	# run files through nroff with jpr's mm macros  to
	# `book', including index at end

	nroff -rO0 -rW64 -mmjpr ch.* > .book 2> .index 

	sed "
	s/.*/&~+~&/
	:repeat
	s/\\\\[fs][+-]*.\(.*~+~\)/\1/
	t repeat
	" .index | sort -fd | sed "s/.*~+~//" | awk '
	BEGIN	{ print ".af P a"
		  print ".PH @@\fBINDEX\fR@@"
		  print ".PF @@\\\\\\\\nP@@"
		  print ".TS"
		  print "l r."
		  print }
		{ print }
	END	{ print ".TE" }
	' | tbl | nroff -rO0 -rW64 -mmjpr >> .book

	sed "
	s/^\(.\)/<TAB>\1/
	:spc2tab
	s/<TAB>        /<TAB><TAB>/
	t spc2tab
	" .book > book
	rm .book .index

At any place in the source file where I judge an index reference
would be useful, I imbed a call to the CX macro. Here's a fragment
from one of my source files:

	.P
	In that case why have to refer to the files on that drive
	as "/hdN/appl/.../.../..."?
	Why not dispense with the "/hdN" part?
	.CX "/appl" "changing /hdN to"
	.CX "/hdN" "changing to /appl"
	.P
	.I
	Case 1: Changing mount directory /hdN to /appl
	.R

The .index file will have in it (where <SOH> represents ^A, hex 01):

	\fB/etc/default/pfpath\fR, secondary drive name<SOH>..   1-4
	secondary drive name, \fB/etc/default/pfpath\fR<SOH>..   1-4
	/appl, changing /hdN to<SOH>..   1-4
	changing /hdN to, /appl<SOH>..   1-4
	/hdN, changing to /appl<SOH>..   1-4
	changing to /appl, /hdN<SOH>..   1-4
	tar, moving hierarchies<SOH>..   1-4
	moving hierarchies, tar<SOH>..   1-4

which are various index references I have created. As the source
is being formatted, these entries get created with the chapter
number (1), and with the output page number (4) within that chapter.
Finally, all the chapters are assembled into one file, at the end
of which appear these fragments:

	512 bytes, blocking.......................................   1-6
	/appl, changing /hdN to...................................   1-4
	background processes, wait................................   7-4
	....
	blocking, key segment.....................................   1-7
	changing /hdN to, /appl...................................   1-4
	changing to /appl, /hdN...................................   1-4
	...
	fortune cookies, /etc/rc.user.............................   1-5
	/hdN, changing to /appl...................................   1-4
	index, 13-byte overhead...................................   3-7
	...

Should any of the arguments to .CX contain font-change stuff, this
will carry through to the final index. In the actual case, I have
/etc/rc.user in boldface in the final text, and in the index as well.
-- 

Time is nature's way of				Jean-Pierre Radley
making sure that everything			jpr@dasys1.UUCP
doesn't happen all at once.			CIS: 76120,1341




=========================================================================
From: alberta!steve

I would imagine that some sort of `sed' script (to convert the 
"key words" into a macro call) and a small macro and a short shell script
to fix things up (like sort the results and throw out extra references) is
all that would be necessary (one of the classic books by Kernighan or
Bourne had an example of doing this).  I think he found what would agree
is the case -- in general this isn't a good idea (but is not hard to do).
(the -msun macros build a TOC from the section heading macro as another
example of the basic principles).
	steve.

PS I am reasonably sure that the book I am thinking of is "The UNIX
System" by S.R. Bourne (ask Ted Bentley).


=========================================================================
From: alberta!steve

Whilst looking for something totally different on my bookshelf
I managed to come across AT&T Bell Laboratories Computing Science
Technical report #128 titled "Tools for Printing Indexes" by Jon Bentley
(of Programming Pearls fame) and Brian Kernighan.  Besides echoing the 
wisdom (or being the source of it) that I said in yesterday's note about 
how futile auto indexing is, it gives the source code (make files, shell
scripts, troff macros, and awk scripts) to deal with an index once generated.
	steve.

=========================================================================
>From: jbw@unix.cis.pittsburgh.edu (Jingbai  Wang)
Organization: Univ. of Pittsburgh, Comp & Info Services

I have created something called `indexor' in C for introducing index
entries and glossary entries in a Scribe .mss file (for all the four
Scribe @index, @indexentry, @seealso, @indexsecondary commands) and
LaTeX \index command. It can be setup for any program like troff. It
was designed for VMS, DOS and UNIX with an ANSI terminal, but I haven't
fully tested it out on UNIX. The program itself is a part of my Scribe
TEC.mak database package (distributed freely, and very powerful). If
you are interested, I will make it available. If I find time to debug
it on UNIX (BSD first), I will do it shortly.


JB Wang
jbw@pittvms.bitnet
jbw@unix.cis.pittsburgh.edu 


=========================================================================
>From: bsa@telotech.UUCP (Brandon S. Allbery)
Organization: _
	      telotech, inc. - Beachwood, OH

I assembled some klugey stuff for troff -mm (should work under other macro
packages, but the index formatting macros produce -mm-like output).  Ask and
ye shall receive.

++Brandon


=========================================================================
From: Jim Westervelt <uunet!uiucuxc!osiris.cso.uiuc.edu!westerve>

A co-worker hassled this for months generating a 300 page manual.
Write him for information:  shapiro@osiris.cso.uiuc.edu


=========================================================================
>From: bsa@telotech.uucp (Brandon S. Allbery)
Organization: Telotech, Inc.
Lines: 881

This is the index-generating package I whipped together for ditroff.  It isn't
particularly pretty, but it does the job.

It has been pointed out to me that my use of the term "automatic indexing" was
ambiguous.  I take it as a matter of faith that one cannot automate the
placement of index "tags" in a source file; no computer program can identify
all references that should or should not be tagged, or even most of them.
What I meant by "automatic" is that it takes no editing or other hackery on
temporary files to produce the index therefrom; I've heard that AT&T has an
(unreleased) package that provides similar functionality to this package but
requires multiple passes over the source file to get it right.

++Brandon
#--------------------------------CUT HERE-------------------------------------
#! /bin/sh
#
# This is a shell archive.  Save this into a file, edit it
# and delete all lines above this comment.  Then give this
# file to sh by executing the command "sh file".  The files
# will be extracted into the current directory owned by
# you with default permissions.
#
# The files contained herein are:
#
# -rw-r--r--   1 bsa      other       7068 Jul 27 23:04 README
# -rw-rw-rw-   1 bsa      other        834 Jul 27 23:08 index.mm
# -rw-r--r--   1 bsa      other       9239 May 26 10:10 xsort.c
# -rw-r--r--   1 bsa      other       2426 May 31 16:34 tmac.mx
#
echo 'x - README'
if test -f README; then echo 'shar: not overwriting README'; else
sed 's/^X//' << '________This_Is_The_END________' > README
XA Multi-Level Index Package for ditroff					7/28/89
X---------------------------------------
X
XThis package contains two sets of macro packages and a C program.  The program
Xis written for System V (and not particularly cleanly; I was in something of a
Xhurry when I hacked it together); the macro packages are designed to avoid
Xcollisions with known macro packages (and probably fails in the case of one I
Xdon't know such as Berkeley -me).
X
XThe file "index.mm" contains definitions for the indexing macros.  They are:
X
X.sX primary [secondary [-]]
X	Insert an index reference for the specified primary and optional
X	secondary topics.  Also inserts a "see" reference for the primary
X	topic constructed by concatenating the two topics in reverse,
X	as if a `.nX "secondary primary"' command were issued, unless the
X	optional third argument is non-null.
X
X.dX primary [secondary [-]]
X	Like .sX, except that the resulting index reference is boldfaced.
X	This is intended to highlight `defining' entries or other points
X	of special interest.
X
X.nX primary [secondary]
X	Inserts a "see/see also" index reference for the topic constructed
X	by concatenating the secondary and primary topics in reverse order
X	with a space between them, pointing to the topic(s) of the most
X	recent `.sX' command.  As distributed, the "see/see also" reference
X	points only to the primary topic:
X
X		.sX widget strange
X			-> "widget, strange"
X			-> "strange widget (see widget)"
X
X.aX primary [secondary]
X	Inserts a "see/see also" for the current topic, pointing to the
X	specified topic.  By default, it points to the complete topic,
X	unlike `.nX'.
X
X		.sX widget strange
X		.aX thingammies
X
X	produces
X
X		widget,
X		  strange, 27 (see also thingammies)
X
X.rX prim-1 sec-1 prim-2 sec-2
X	Produces a "see also" for topic-1 pointing to topic-2.  It
X	currently prints a warning on ther terminal because I have
X	replaced it with the `.nX' and `.aX' macros; however, in
X	some unusual circumstances it may be desirable to force a
X	linkage in this manner.
X
X.pX
X	Writes the current page nuber plus one to the indexer, to set
X	the initial page number for the index.  I suggest calling this
X	only once; if you don't call it, it will be called automatically
X	as an endmacro (see below).
X
XThe index.mm file also plants an endmacro (via the `.em' request) which writes
Xthe last page number printed to the index temporary; if you are using a macro
Xpackage which generates table of contents and/or cover sheet pages, or other
Xpages which print after the main text body but are to be placed before the
Xtext after printing, you must call `.pX' at the end of the text body.  (This
Xmacro automatically disengages the endmacro, to avoid writing the page number
Xtwice.)
X
XTo generate the index, run `troff' with standard error redirected to a file,
Xthen run `xsort' on the file.  (Any `troff' messages will be displayed while
X`xsort' is running.)  The output of `xsort' may be saved to a file or piped
Xdirectly to `troff -mmx'.  (Personally, I use a script which uses a named pipe
Xto do all this concurrently:
X
X	mknod /tmp/$$ p
X	troff ... index.mm ... 2> /tmp/$$ &
X	xsort < /tmp/$$ | troff -mmx
X	rm /tmp/$$
X
XThat way the hackery is pretty much invisible.)
X
XThe `xsort' program does a number of things:
X
X* It collects lines from its input whose second character is a BEL (\7, ^G)
X  and whose first character is not a BEL
X
X* It prints on its own standard error lines which do not fit the above
X  criteria, thus effectively passing error messages untouched
X
X* It checks the index entries for certain features which can't be handled
X  properly, such as unexpanded string and number registers and some escapes
X  like `\k'
X
X* It coalesces index entries into page ranges, splitting out defining
X  entries (multiple defining entries might break this)
X
X* It creates copies of the index entries by converting all text to uppercase
X  and stripping all escapes, then sorts the list of index entries by these
X  copies
X
X* It inserts alphabet headings for each initial letter found in the sorted
X  index file
X
X* It writes out the resulting sorted file in the format expected by the -mmx
X  macros (see below)
X
XIt places any page-number entries at the top of the file; since the sort
Xordering of these is undefined, multiple page number entries will produce
Xrandom page numbering in the index.  Other entries are sorted such that
Xprimary topics are in alphabetical order, with entries lacking secondary topic
Xpreceding secondary topics (also in alphabetical order), then page number
Xranges in numeric order, then "see"/"see also" entries.
X
XThe final component of the package is the index formatting macros.  These are
Xtuned for my current project; you will probably want to hack on them,
Xespecially since they're set up to look like the -mm macro package's output
X(modulo header fonts, which I also changed in my document).  These could use
Xrewriting to be a bit more adjustable, although most of the important values
Xare stored right at the top of the file.
X
XThe index produced is two-column, with continuation headings at the top of
Xeach column if needed.  A centered header precedes the entries for a
Xparticular letter.  A primary entry may have any or all of:  page references,
X"see/see also" entries, and subtopics which may themselves have page
Xreferences and/or "see/see also" entries.  If a topic has only one "see/see
Xalso" entry, it gets a "see" entry; if it also has page entries, it gets a
X"see also" entry; if it has only "see/see also" entries, the first is a "see"
Xentry and the others are "see also" entries:
X
X	foo, 1-2, \fB7\fP (\fIalso see\fP baz)
X	  bar (\fIsee\fP gunk) (\fIalso see\fP
X	      widget)
X
XEntries are wrapped to fit their columns; wrapped lines get more indentation
Xthan secondary topics, as shown above.  This is true even of wrapped primary
Xtopics.  The word(s) "see"/"also see" are printed in italic.
X
XThe macros defined in the -mmx package are:
X
X.@P page
X	Sets the page number for the first page of the index.  If not
X	called, defaults to 1.
X
X.aH head
X	Prints a header for a group of index entries.  `xsort' uses this
X	to print alphabetic headers.
X
X.eX primary secondary firstpage [lastpage]
X	Prints a page-number range.  The page numbers are separated by
X	an en-dash.  (You're free to hack the macros to change this; I
X	set this up before I learned that most indexes use em-dashes
X	for this.)  If the second page number is omitted, only a single
X	page number is printed.
X
X.dX primary secondary firstpage [lastpage]
X	Prints a page-number range as with `.ex', but in boldface.
X	Typically called with only three arguments, although nothing
X	requires this.
X
X.aX prim-1 sec-1 prim-2 sec-2
X	Prints a "see/see also" entry from topic-1 to topic-2.
X
XThis stuff is far from clean; I hacked the whole thing together in an
Xafternoon in order to whip up an index for a large document I'm working on.
XOn the other hand, it works, except for occasional surprises in the last item
Xof the index output -- which may be an artifact of the printer.
X
X++Brandon				       _
XBrandon S. Allbery, speaking ex cathedra from telotech, inc.
________This_Is_The_END________
if test `wc -c < README` -ne 7068; then
	echo 'shar: README was damaged during transit (should have been 7068 bytes)'
fi
fi		; : end of overwriting check
echo 'x - index.mm'
if test -f index.mm; then echo 'shar: not overwriting index.mm'; else
sed 's/^X//' << '________This_Is_The_END________' > index.mm
X.de sX
X.nr l. \\n(.c-\\n(L.
X.tm X\\$1\\$2\\n%
X.\"if !\\w\\$3 .if \\w\\$2 .tm R\\$2 \\$1\\$1\\$2
X.ds x1 \\$1
X.ds x2 \\$2
X..
X.\" Obsolete indirect index reference ("foo (see bar)")
X.de rX
X.nr l. \\n(.c-\\n(L.
X.tm .rX (\\*(S. \\n(l.) (OBSOLETE)
X.tm R\\$1\\$2\\$3\\$4
X..
X.\" New form of .rX -- should subsume all uses thereof
X.de nX
X.nr l. \\n(.c-\\n(L.
X.tm R\\$1\\$2\\*(x1\\*(x2
X..
X.\" "Also see" (.nX in reverse order)
X.de aX
X.nr l. \\n(.c-\\n(L.
X.tm R\\*(x1\\*(x2\\$1\\$2
X..
X.\" Defining use of item (boldfaced in index output)
X.de dX
X.nr l. \\n(.c-\\n(L.
X.tm x\\$1\\$2\\n%
X.if !\\w\\$3 .if \\w\\$2 .tm R\\$2 \\$1\\$1\\$2
X.ds x1 \\$1
X.ds x2 \\$2
X..
X.\" Force a page-number entry at end of run
X.\" Can be called explicitly, overriding endmacro call
X.de pX
X.em
X.nr P. \\n%+1
X.tm P\\n(P.
X..
X.em eM
________This_Is_The_END________
if test `wc -c < index.mm` -ne 834; then
	echo 'shar: index.mm was damaged during transit (should have been 834 bytes)'
fi
fi		; : end of overwriting check
echo 'x - xsort.c'
if test -f xsort.c; then echo 'shar: not overwriting xsort.c'; else
sed 's/^X//' << '________This_Is_The_END________' > xsort.c
X/*
X * Sort an index specification file.  The ordering is c1 c2 c0 c3.
X * Sorting is done with all troff escape sequences removed and case-
X * independently.
X */
X
X#include <stdio.h>
X
Xextern void *malloc();
Xextern void *realloc();
X
X/* The sort command.							*/
X
Xchar sort[] = "exec /bin/sort -o INDEX.TMP -t\7 +0 -2 +2r -3 +3 -5";
X
X/* Map characters to case-independent versions of characters for sorting */
Xchar cic[] = 
X{
X    0000, 0001, 0002, 0003, 0004, 0005, 0006, 0007,
X    0010, 0011, 0012, 0013, 0014, 0015, 0016, 0017,
X    0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027,
X    0030, 0031, 0032, 0033, 0034, 0035, 0036, 0037,
X    0040, 0041, 0042, 0043, 0044, 0045, 0046, 0047,
X    0050, 0051, 0052, 0053, 0054, 0055, 0056, 0057,
X    0060, 0061, 0062, 0063, 0064, 0065, 0066, 0067,
X    0070, 0071, 0072, 0073, 0074, 0075, 0076, 0077,
X    0100, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
X    0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
X    0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
X    0130, 0131, 0132, 0133, 0134, 0135, 0136, 0137,
X    0140, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
X    0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
X    0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
X    0130, 0131, 0132, 0173, 0174, 0175, 0176, 0177,
X    0200, 0201, 0202, 0203, 0204, 0205, 0206, 0207,
X    0210, 0211, 0212, 0213, 0214, 0215, 0216, 0217,
X    0220, 0221, 0222, 0223, 0224, 0225, 0226, 0227,
X    0230, 0231, 0232, 0233, 0234, 0235, 0236, 0237,
X    0240, 0241, 0242, 0243, 0244, 0245, 0246, 0247,
X    0250, 0251, 0252, 0253, 0254, 0255, 0256, 0257,
X    0260, 0261, 0262, 0263, 0264, 0265, 0266, 0267,
X    0270, 0271, 0272, 0273, 0274, 0275, 0276, 0277,
X    0300, 0301, 0302, 0303, 0304, 0305, 0306, 0307,
X    0310, 0311, 0312, 0313, 0314, 0315, 0316, 0317,
X    0320, 0321, 0322, 0323, 0324, 0325, 0326, 0327,
X    0330, 0331, 0332, 0333, 0334, 0335, 0336, 0337,
X    0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347,
X    0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357,
X    0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367,
X    0370, 0371, 0372, 0373, 0374, 0375, 0376, 0377,
X};
X
X/* Error processing.							*/
X
X#define srterr()	error("Cannot start sort process")
X#define tmperr()	error("Cannot open sort file")
X#define memerr()	error("Out of memory")
X#define typerr()	error("Type code must be `P', `X', `x', or `R'")
X#define idxerr()	error("Construct not indexable")
X
Xint lineno;
X
Xerror(s)
X    char *s;
X{
X    fprintf(stderr, "xsort: %s on input line %d\n", s, lineno);
X    exit(1);
X}
X
X/* Make an independent copy of a string.				*/
X
Xchar *
Xsavestr(s)
X    char *s;
X{
X    char *new;
X
X    if (!s)
X	s = "";
X    if (!(new = malloc(strlen(s) + 1)))
X	memerr();
X    strcpy(new, s);
X    return new;
X}
X
X/* Output a string in a useful form.					*/
X
X#define emit(str)	printf(" \"%s\"", (str))
X#define emitrng(n1,n2)	printf(" %d %d", (n1), (n2))
X
X/* Return a "normalized" version of a string.				*/
X
Xchar *
Xderoff(s)
X    register char *s;
X{
X    char buf[1024];
X    int delim;
X    register char *cp;
X
X    for (cp = buf; *s; s++)
X    {
X	if (*s != '\\')
X	    *cp++ = cic[*s];
X	else
X	{
X	    switch (*++s)
X	    {
X	    case '\\':
X	    case '\'':
X	    case '`':
X	    case '-':
X	    case '.':
X	    case ' ':
X	    case '0':
X	    case '|':
X	    case '^':
X	    case '&':
X	    case '!':
X	    case '"':
X	    default:
X		*cp++ = *s;
X		break;
X	    case '%':
X	    case 'a':
X	    case 'c':
X	    case 'd':
X	    case 'p':
X	    case 'r':
X	    case 'u':
X		break;
X	    case 't':
X		*cp++ = '\t';
X		break;
X	    case 'e':
X		*cp++ = '\\';
X		break;
X	    case '(':
X		*cp++ = '\177';
X		s += 2;
X		break;
X	    case '$':
X	    case '*':
X	    case 'b':
X	    case 'D':
X	    case 'g':
X	    case 'j':
X	    case 'n':
X	    case 'o':
X	    case 'w':
X	    case 'z':
X		idxerr();
X	    case 'f':
X		if (*++s == '(')
X		    s += 2;
X		break;
X	    case 's':
X		if (*++s == '+' || *s == '-')
X		    s++;
X		s++;
X		break;
X	    case 'h':
X	    case 'H':
X	    case 'l':
X	    case 'L':
X	    case 'S':
X	    case 'v':
X	    case 'x':
X		delim = *++s;
X		for (s++; *s != delim; s++)
X		    ;
X		break;
X	    }
X	}
X    }
X    *cp = '\0';
X    return savestr(buf);
X}
X
X/* Convert a number to a string with leading zeroes, for alpha sort.	*/
X
Xchar *stringify(n)
X{
X    char buf[8];
X
X    sprintf(buf, "%5.5d", n);
X    return savestr(buf);
X}
X
X/*
X * An input line is made up of 5 fields, containing a type code, first
X * and second index marks, page number or first alternative mark, and
X * second alternative mark.  Some of these may be null.  For each input
X * line, we split it into its component fields, save the original form
X * of each field and its sortable version (normalized to lowercase and
X * with troff escapes deleted) and stuff it into our sort buffer.  After
X * all lines have been read, the sort buffer is sorted and the external
X * forms of each entry are printed.
X */
X
Xmain()
X{
X    char buf[10240];
X    register char *cp;
X    char *c0, *c1, *c2, *c3, *c4, *s0, *s1, *s2, *s3, *s4;
X    char *os1, *os2, *os3, *os4, *oc1, *oc2;
X    FILE *sortproc;
X    int firstpage, lastpage, curpage, lastcmd;
X
X    if (!(sortproc = popen(sort, "w")))
X	srterr();
X    while (gets(buf))
X    {
X	lineno++;
X	cp = buf;
X	for (c0 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	if (!*cp)
X	{
X	    fprintf(stderr, "%s\n", buf);
X	    continue;
X	}
X	*cp = '\0';
X	c0 = savestr(c0);
X	*cp++ = '\7';
X	for (c1 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	if (!*cp)
X	{
X	    fprintf(stderr, "%s\n", buf);
X	    free(c0);
X	    continue;
X	}
X	*cp = '\0';
X	c1 = savestr(c1);
X	*cp++ = '\7';
X	for (c2 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	if (!*cp)
X	{
X	    fprintf(stderr, "%s\n", buf);
X	    free(c1);
X	    free(c0);
X	    continue;
X	}
X	*cp = '\0';
X	c2 = savestr(c2);
X	*cp++ = '\7';
X	for (c3 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	if (!*cp)
X	{
X	    fprintf(stderr, "%s\n", buf);
X	    free(c2);
X	    free(c1);
X	    free(c0);
X	    continue;
X	}
X	*cp = '\0';
X	c3 = savestr(c3);
X	*cp++ = '\7';
X	for (c4 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	if (*cp)
X	{
X	    fprintf(stderr, "%s\n", buf);
X	    free(c3);
X	    free(c2);
X	    free(c1);
X	    free(c0);
X	    continue;
X	}
X	c4 = savestr(c4);
X	if (*c0 != 'X' && *c0 != 'R' && *c0 != 'P' && *c0 != 'x')
X	    typerr();
X	s0 = deroff(c0);
X	s1 = deroff(c1);
X	s2 = deroff(c2);
X	if (*s0 == 'X')
X	{
X	    s3 = stringify(atoi(c3));
X	    if (*c0 == 'X')
X		s4 = stringify(atoi(c4));
X	    else
X		s4 = savestr("");
X	}
X	else
X	{
X	    s3 = deroff(c3);
X	    s4 = deroff(c4);
X	}
X	fprintf(sortproc, "%s%s%s%s%s%s%s%s%s%s\n", s1, s2,
X		s0, s3, s4, c0, c1, c2, c3, c4);
X	free(c0);
X	free(c1);
X	free(c2);
X	free(c3);
X	free(c4);
X	free(s0);
X	free(s1);
X	free(s2);
X	free(s3);
X	free(s4);
X    }
X    if (pclose(sortproc) != 0)
X	srterr();
X    if (!(sortproc = fopen("INDEX.TMP", "r")))
X	tmperr();
X    os1 = savestr("");
X    os2 = savestr("");
X    os3 = savestr("");
X    os4 = savestr("");
X    oc1 = savestr("");
X    oc2 = savestr("");
X    firstpage = 0;
X    lastpage = 0;
X    lastcmd = 0;
X    while (fgets(buf, sizeof buf, sortproc))
X    {
X	for (cp = buf; *cp && *cp != '\n'; cp++)
X	    ;
X	*cp = '\0';
X	s1 = buf;
X	for (cp = s1; *cp && *cp != '\7'; cp++)
X	    ;
X	*cp++ = '\0';
X	for (s2 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	*cp++ = '\0';
X	for (s0 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	*cp++ = '\0';
X	for (s3 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	*cp++ = '\0';
X	for (s4 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	*cp++ = '\0';
X	for (c0 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	*cp++ = '\0';
X	for (c1 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	*cp++ = '\0';
X	for (c2 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	*cp++ = '\0';
X	for (c3 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	*cp++ = '\0';
X	for (c4 = cp; *cp && *cp != '\7'; cp++)
X	    ;
X	*cp++ = '\0';
X	/*
X	 * can't use sort -u: need to get 'x' and 'X' ordered as if they were
X	 * the same character, but must NOT uniquify them as if they were the
X	 * same!
X	 */
X	if (lastcmd == *c0 && !strcmp(s1, os1) && !strcmp(s2, os2)
X	    && !strcmp(s3, os3) && !strcmp(s4, os4))
X	    continue;
X	if (lastcmd == 'x' && *c0 == 'X' && !strcmp(s3, os3))
X	    continue;
X	curpage = atoi(c3);
X	if (!strcmp(s1, os1) && !strcmp(s2, os2) && lastcmd == 'X'
X	    && curpage == lastpage + 1)
X	    lastpage++;
X	else if (lastcmd == 'X' && *os1)
X	{
X	    printf(".eX");
X	    emit(oc1);
X	    emit(oc2);
X	    emitrng(firstpage, lastpage);
X	    putchar('\n');
X	    firstpage = 0;
X	    lastpage = 0;
X	}
X	if (cic[*s1] != cic[*os1])
X	    printf(".aH %c\n", cic[*s1]);
X	free(os1);
X	free(os2);
X	free(os3);
X	free(os4);
X	free(oc1);
X	free(oc2);
X	os1 = savestr(s1);
X	os2 = savestr(s2);
X	os3 = savestr(s3);
X	os4 = savestr(s4);
X	oc1 = savestr(c1);
X	oc2 = savestr(c2);
X	lastcmd = *c0;
X	switch (*c0)
X	{
X	case 'x':
X	    printf(".dX");
X	    emit(c1);
X	    emit(c2);
X	    printf(" %d\n", curpage);
X	    break;
X	case 'X':
X	    if (firstpage == 0)
X	    {
X		firstpage = curpage;
X		lastpage = curpage;
X	    }
X	    break;
X	case 'R':
X	    printf(".aX");
X	    emit(c1);
X	    emit(c2);
X	    emit(c3);
X	    emit(c4);
X	    putchar('\n');
X	    break;
X	case 'P':
X	    printf(".@P %s\n", c4);
X	    break;
X	}
X    }
X    if (lastcmd == 'X')
X    {
X	printf(".eX");
X	emit(os1);
X	emit(os2);
X	emitrng(firstpage, lastpage);
X	putchar('\n');
X    }
X    unlink("INDEX.TMP");
X    exit(0);
X}
X
X/*
X * Local Variables:
X * compilation-command: "cc -O -o xsort xsort.c"
X * End:
X */
________This_Is_The_END________
if test `wc -c < xsort.c` -ne 9239; then
	echo 'shar: xsort.c was damaged during transit (should have been 9239 bytes)'
fi
fi		; : end of overwriting check
echo 'x - tmac.mx'
if test -f tmac.mx; then echo 'shar: not overwriting tmac.mx'; else
sed 's/^X//' << '________This_Is_The_END________' > tmac.mx
X.\" -*- Mode: Nroff -*-
X.\" Troff/MM index macros:  2-column indexes designed for troff/MM documents
X.\"
X.\" MANIFEST CONSTANTS
X.\"
X.nr Po \nO			\" page offset
X.nr Pw \nW			\" page width
X.nr Cw (\nWu/2u)-0.25i		\" column width
X.nr Xc 0.375i			\" index continuation indent
X.nr X2 0.125i			\" index 2nd level item indent
X.nr X1 0.000i			\" index 1st level item indent
X.nr Xh 1.000i			\" index header indent
X.\"
X.\" Initialize page
X.\"
X.po \n(Pou
X.lt \n(Pwu
X.ll \n(Pwu
X.sp 8
X.ce
X\f(HBINDEX\fP
X.sp 2
X.mk
X.nr cP 0 1
X.nr x2 \n(Xcu-\n(X2u
X.nr x1 \n(Xcu-\n(X1u
X.ll \n(Cwu
X.in \n(Xcu
X.na
X.ns
X.\"
X.\" Output continue marks if needed -- used by .hP and .fP.
X.\"
X.de cO
X.if \\*(X.\\*(A. \{.ti -\\n(x1u
X\\*(X. (continued)
X.br
X.if \\*(B.\\*(Y. .if \\w\\*(B. \{.ti -\\n(x2u
X\\*(Y. (continued)\c
X., \}\}
X..
X.\"
X.\" Page header trap -- assign top of column
X.\"
X.de hP
X.sp 8
X.nr cP 0 1
X.mk
X.cO
X.ns
X..
X.wh 0 hP
X.\"
X.\" Page footer trap -- if in first col then switch cols else force page
X.\"
X.de fP
X.br
X.ie \\n+(cP<2 \{.po \\n(Pou+(\\n(Pwu/2u)
X.rt
X.ns
X.cO \}
X.el \{.po \\n(Pou
X.sp 5
X.tl ''- % -''
X.bp \}
X..
X.wh -8 fP
X.\"
X.\" Assign page number for first page of index
X.\"
X.de @P
X.nr % \\$1
X..
X.\"
X.\" Alphabetic header
X.\"
X.de aH
X.br
X.ne 6
X.sp 1.5
X.ti \\n(Xhu
X\f(HB\\$1\fP
X.rm X.
X.rm Y.
X.sp 0.5
X..
X.\"
X.\" Index entry:  defining entry
X.\"
X.de dX
X.ds A. \\$1
X.ds B. \\$2
X.if !\\$1\\*(X. \{.br
X.if \\$2\\*(Y. .ne 2
X.ti -\\n(x1u
X.ds X. \\$1
X\\$1\c
X.nr Z. 0
X.rm Y. \}
X.ie \\$2\\*(Y. .nr Z. \\n(Z.+1
X.el \{.br
X.ds Y. \\$2
X.ti -\\n(x2u
X\\$2,
X.nr Z. 0 \}
X.if \\n(Z. ,
X.nr Z. \\n(Z.+1
X\fB\\$3\fP\c
X.rm A.
X.rm B.
X..
X.\"
X.\" Index entry:  primary entry
X.\"
X.de eX
X.ds A. \\$1
X.ds B. \\$2
X.if !\\$1\\*(X. \{.br
X.if \\$2\\*(Y. .ne 2
X.ti -\\n(x1u
X.ds X. \\$1
X\\$1\c
X.nr Z. 0
X.rm Y. \}
X.ie \\$2\\*(Y. .nr Z. \\n(Z.+1
X.el \{.br
X.ds Y. \\$2
X.ti -\\n(x2u
X\\$2,
X.nr Z. 0 \}
X.if \\n(Z. ,
X.nr Z. \\n(Z.+1
X.ie \\$3\\$4 \\$3\c
X.el \\$3\-\\$4\c
X.rm A.
X.rm B.
X..
X.\"
X.\" Index entry:  "see"/"see also"
X.\"
X.de aX
X.ds A. \\$1
X.ds B. \\$2
X.if !\\$1\\*(X. \{.br
X.ti -\\n(x1u
X.ds X. \\$1
X\\$1\c
X.nr Z. 0
X.rm Y. \}
X.if !\\$2\\*(Y. \{.br
X.ds Y. \\$2
X.ti -\\n(x2u
X\\$2
X.nr Z. 0 \}
X\&
X(\c
X.if \\n(Z. \fIalso\fP
X.nr Z. \\n(Z.+1
X\fIsee\fP \\$3\c
X.if \\w\\$4 , \\$4\c
X)
X.rm A.
X.rm B.
X..
X.\"----------------------------------------------------------------------------
X.\"
X.\" Local Variables:
X.\" compilation-command: "run -Dnull"
X.\" End:
X.\"
________This_Is_The_END________
if test `wc -c < tmac.mx` -ne 2426; then
	echo 'shar: tmac.mx was damaged during transit (should have been 2426 bytes)'
fi
fi		; : end of overwriting check
exit 0
-- 
David Ferrier                            | computer: 
Edmonton, Alberta                        | a million morons
uunet!myrias.COM!dbf                     | working at the speed of light