[comp.text] TeX index program available

thewalt@RITZ.ce.cmu.edu (Chris Thewalt) (12/09/89)

I just finished writing a quick little program to convert .idx files
into .ind files that can be included in TeX documents.  It is called
idxtex and some of the features are:
  1) can change fonts for page numbers
  2) collects ranges of pages
  3) supports a simple cross reference scheme
  4) supports multiple indexes (generates mulitple .ind files)
  5) sorts properly even when items contain TeX
  6) can overide the sort key

Basically I added a sublanguage to the index command:
 \index{stuff}

where stuff can be:

 |1 (optional) beginning of item
 |2 beginning of subitem
 |3 begining of subsubitem
 |f font to wrap page in
 |s a "see" cross reference, deletes the current page
 |a a "see also" cross reference, keeps current page
 |u use the following string as the sort key
 |i name of alternate index (default is same root name as the .idx file)

My big question is where do I put it? Labrea or some other place?

Let me know (or maybe post to comp.sources.misc???)

anyway, here's a sample of what it does in case the description above
wasn't clear:

---------------------------------------------------------------------------
data.idx (the input) :
----------------------
\indexentry{topic}{i}
\indexentry{topic}{ii}
\indexentry{topic}{iii}
\indexentry{topic}{1}
\indexentry{topic}{2}
\indexentry{topic}{3}
\indexentry{topic |f \bf}{4}
\indexentry{topic |f \bf}{5}
\indexentry{topic |f \bf}{6}
\indexentry{topic}{7}
\indexentry{topic |f \em}{9}
\indexentry{topic |2 sub1}{10}
\indexentry{topic |2 sub2 with see xref |s other1}{10}
\indexentry{topic |2 sub3 with see xref, but more refs |s other2}{10}
\indexentry{topic |2 sub3 with see xref, but more refs}{11}
\indexentry{topic |2 subitem with see also xref |a other3}{11}
\indexentry{topic |2 sub1 |3 subsub1}{12}
\indexentry{yet another topic}{12}
\indexentry{{\em underst\"{a}nding tex}?}{12}
\indexentry{topic |2 {\bf foobar}}{13}
\indexentry{topic |2 {\bf foobar} |3 {\Large test}}{13}
\indexentry{a test of sortby using "ttt" |u ttt}{15}

data.ind (output after "idxtex data")
-------------------------------------
\item topic, i, ii, iii, 1--3, {\bf 4}--{\bf 6}, 7, {\em 9}
 \subitem {\bf foobar}, 13
  \subsubitem {\Large test}, 13
 \subitem sub1, 10
  \subsubitem subsub1, 12
 \subitem sub2 with see xref. See other1
 \subitem sub3 with see xref, but more refs, 11. See also other2
 \subitem subitem with see also xref, 11. See also other3
\item a test of sortby using "ttt", 15
\item {\em underst\"{a}nding tex}?, 12
\item yet another topic, 12
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Christopher Robin Thewalt		These opinions are not necessarily
thewalt@ce.cmu.edu			shared by my employer...
Carnegie Mellon University

dhosek@jarthur.Claremont.EDU (D.A. Hosek) (12/09/89)

You may want to look at the makeindex program (available from 
berkeley.edu by FTP, I believe) which is more or less "official".
(it's co-written by Leslie Lamport and Pehong Chen in C with 
dialectical versions for Unix, VMS, Microsoft C, and Waterloo C
on VM/CMS. Someday, it is hoped that it will be translated into
WEB, but that will take somebody's time and effort).

-dh
-- 
"Odi et amo, quare id faciam, fortasse requiris?
   nescio, sed fieri sentio et excrucior"          -Catullus
D.A. Hosek.                        UUCP: uunet!jarthur!dhosek
                               Internet: dhosek@hmcvax.claremont.edu

jbw@unix.cis.pitt.edu (Jingbai Wang) (12/09/89)

In article <3478@jarthur.Claremont.EDU> dhosek@jarthur.UUCP (D.A. Hosek) writes:
>You may want to look at the makeindex program (available from 
>berkeley.edu by FTP, I believe) which is more or less "official".
>(it's co-written by Leslie Lamport and Pehong Chen in C with 
>dialectical versions for Unix, VMS, Microsoft C, and Waterloo C
>on VM/CMS. Someday, it is hoped that it will be translated into
>WEB, but that will take somebody's time and effort).
>

It is not too conflicting, because one was supposed to work for TeX and
the other was for LaTeX, although, I believe, the time spent on developing
a TeX makeindex could have very well been used to develop a macro file
which can make LaTeX makeindex understand indicies inserted in TeX.
As I saw, however, the one created for TeX seems to use more stuffs from
Scribe (as Scribe was heavily used at CMU just like at Pitt).

Richard Stallman of GnuEmacs pointed out that a major weakness of TeX is
that it does not know how to sort indices, which is quite true. Otherwise
why should people develop makeindex. This makes feel very strange. Sorting
strings is the simplest thing to do in programming a text formatter,
maybe Knuth has it in TeX 3.0?

No matter it is internal or external, we will survive. Nevertheless, life
can still be improved. It is why I wrote the indexor which can be ftped
from june.cs.washington.edu as indexor.tar.Z for UNIX (including BSD,
V7, Ultrix, V), VMS, and DOS. This was originally developed for Scribe.
Thus, it ought not be something like makeindex. It is actually a specially
purpose screen editor for you to browse through the manuscript word by
word, line by line, character by character or page by page. You can mark
the phrase if it needs to be indexed, and then press a key to produce
an index command. It has proved to be very useful in writing a long book
or manual. It can be configurated to compile to work on LaTeX or Scribe.

JB Wang

ken@cs.rochester.edu (Ken Yap) (12/09/89)

|Richard Stallman of GnuEmacs pointed out that a major weakness of TeX is
|that it does not know how to sort indices, which is quite true. Otherwise

Why should TeX have everything but the kitchen sink? Even Knuth, who
writes large programs, didn't go out and make TeX compile fonts or half
a dozen other things. Making indices is not something ordinary users
use often. Here we have only a couple of users who use makeindex
regularly. (No doubt this proportion is somewhat higher in sites that
do a lot of technical writing.) I would not like to see what is an
already moderately large program (200k or so) bloat with features that
could easily be implemented externally.

chris@mimsy.umd.edu (Chris Torek) (12/09/89)

In article <1989Dec8.184444.11467@cs.rochester.edu>
ken@cs.rochester.edu (Ken Yap) writes:
>Why should TeX have everything but the kitchen sink? ...

[ to be like Emacs, of course :-) ]

>I would not like to see what is an already moderately large program
>(200k or so) bloat with features that could easily be implemented
>externally.

(such as tbl, eqn, ...? oh never mind, I am just in a baiting mood at
the moment)

There is something, though, that would be very useful to have in TeX
that would take little space, add a great deal of power, and could be
done on most systems (albeit in a system-dependent fashion):  TeX
should be able to run a subprocess.  (Heck, even MS-DOS can do it,
although there is the minor fact that IBM PC TeXes tend to use all the
memory in the machine, so that you would not be able to do anything
in the spawned process.

If one could write, e.g.,

	\system{make-index <foo.idx > foo.ind}
	\input{foo.ind}

(note that I am using LaTeX syntax here, something TeX could stand a
bit more of itself) the `index problem' would be taken care of, without
building the sorting directly into TeX.  Other things could be done
as well:

	\system{ls | text2tex > ls.tex}
	\input{ls}

and so forth.

Of course, all is not roses.  For one, the syntax of what goes inside
a \system invocation would be (alas!) system-dependent.  (You might also
need to get funny characters in, such as \ { } ` etc., which is tricky.)
No longer would people be able to mail TeX sources with confidence
that the recipient could use it---this is not a total loss, since this
already happens with fonts, to some extent.  Also, people would have to
exercise more care as to what they TeX.  At the moment, the worst TeX
will do is overwrite files.  With a \system capability, a malicious user
could send `documents' that do more.  (I find this argument not terribly
effective.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

rokicki@polya.Stanford.EDU (Tomas G. Rokicki) (12/09/89)

In article <21186@mimsy.umd.edu>, chris@mimsy.umd.edu (Chris Torek) writes:
> There is something, though, that would be very useful to have in TeX
> that would take little space, add a great deal of power, and could be
> done on most systems (albeit in a system-dependent fashion):  TeX
> should be able to run a subprocess.

This is done on several implementations of TeX, including IBM/CMS and
Amiga.  The syntax is simply

\write18{makeindex \jobname}

for instance; you can precede it with \immediate if you want.  Thus,
you can do almost anything . . .  (It might not be 18 on all machines.)
The nice thing about this format is, if the extension isn't supported
at a particular machine, at least you can see on the screen and in the
log file the command that was supposed to be run.

On the Amiga, it can invoke an interactive ARexx script that can make
almost anything possible . . .

-tom

sc@qmet.UUCP (Steve Croft) (12/11/89)

ken@cs.rochester.edu (Ken Yap) writes:
> Why should TeX have everything but the kitchen sink? Even Knuth, who
> writes large programs, didn't go out and make TeX compile fonts or half
> a dozen other things. Making indices is not something ordinary users
> use often.

Rememebr that TeX was written for typesetting books.  I see many
books with indices nowadays.  ;)

Steve
-- 
******************************************************************************
*   If what I say is not correct,    *      Steve Croft, Qualimetrics, Inc.  *
*       then it's not what I meant!  *            (uunet!mmsac!qmet!sc)      *
******************************************************************************

dhosek@jarthur.Claremont.EDU (D.A. Hosek) (12/11/89)

>ken@cs.rochester.edu (Ken Yap) writes:
>> Why should TeX have everything but the kitchen sink? Even Knuth, who
>> writes large programs, didn't go out and make TeX compile fonts or half
>> a dozen other things. Making indices is not something ordinary users
>> use often.

Actually, it _is_ possible for TeX to sort an index. TeX is Turing-complete
so any task which is computable is in theory computable with TeX (I hope
I'm understanding what Turing-complete is all about, I'm not much of a 
C.S. person).

However, it is far easier to accomplish this task with an external 
program.

It's actually kind of amazing the things that TeX will do. The project
Athena people at MIT have TeX playing "animals" and keeping the books
for their supply of soda. (includinging printing periodical statements,
of course) Michael Wichura draws amazing statistical charts using PiCTeX. 
Don Knuth demonstrated that it's possible to set halftone drawings with 
TeX. The issue is less one of can it be done internal to TeX as whether
TeX is the right tool. My pocket knife has a pair of scissors in it, but
if I want to cut things, I usually use regular scissors. TeX could sort
an index, but makeindex will do it faster and with less effort.

-dh
-- 
"Odi et amo, quare id faciam, fortasse requiris?
   nescio, sed fieri sentio et excrucior"          -Catullus
D.A. Hosek.                        UUCP: uunet!jarthur!dhosek
                               Internet: dhosek@hmcvax.claremont.edu

ath@prosys.se (Anders Thulin) (12/11/89)

In article <818@qmet.UUCP> sc@qmet.UUCP (Steve Croft) writes:
>
>Rememebr that TeX was written for typesetting books.  I see many
>books with indices nowadays.  ;)

On the spot! TeX was written for typesetting books.  And even an index
has to be prepared before it can be typeset. This preparation may
or may not be done by TeX, as your fancy takes you.
-- 
Anders Thulin, Programsystem AB, Teknikringen 2A, S-583 30 Linkoping, Sweden
ath@prosys.se   {uunet,mcsun}!sunic!prosys!ath

emcmanus@cs.tcd.ie (Eamonn McManus) (12/12/89)

D.A. Hosek writes:
> Actually, it _is_ possible for TeX to sort an index.

Here is one way to do it, for example.  Let the index entry for word foo
be represented by the macro \csname I!foo\endcsname.  This macro expands
to the following tokens: the macro for the index entry for a word
alphabetically less than the current one; ditto for an alphabetically
greater word; and the index text itself (page numbers or whatever).
Either or both of the two pointers can be null.  Hence we are storing the
index entries in a binary tree, and the usual techniques can be used for
inserting new entries and for traversing the tree in order.  The method
can be expanded to allow for sub-entries etc.

For example, if entries foo:1, bar:2, and spletch:3 have been added we
might have the following macros:
\indexroot -> \I!foo
\I!foo -> \I!bar \I!spletch 1
\I!bar -> \null \null 2
\I!spletch -> \null \null 3

Of course this method is completely impractical, because the macros to
implement it would be very slow and very hard to write, and more
importantly because in any practical TeX implementation you will run out
of memory while indexing a work of any reasonable size.  Other
possibilities exist, such as storing the entries in a file and performing
an external sort on this file; it is just that TeX is not the most
suitable tool for the job.
-- 
Eamonn McManus				emcmanus@cs.tcd.ie
Distributed Systems Group, TCD		...!uunet!mcsun!cs.tcd.ie!emcmanus
   "Kea: A New Zealand parrot that sometimes kills sheep." -- Chambers