[comp.text] Need help with page number 1 of X and TOC in [ntp]roff

apple@nprdc.arpa (James Apple) (08/15/89)

	hi
		i'm working on a document  that is composed of many sheets.
	each sheet may be one or more pages long.  So on the top of each
	sheet it must have page numbering like page 1 of 8, and on the
	bottom of each page the standard page number 1,2,3 etc ...

		they way i'm working on this now is to use the .tm command
	on the top of each page.  I dump the document to /dev/null and the
	terminal messages to a file and then grind through the file to 
	count the number of pages per sheet.  Then a format the document 
	again and print it.  Ugly ......


		But it get worse , I also need a table of contents that
	lists which page each sheet starts on.  So i print the whole document
	to /dev/null and then ......   

		By the time i've printed the whole document, which can be 
	hundreds of pages long,  i've formated each page 4 times.


	HELP it seems that there has to be a better way to handle this.
	
	Summary:
		
	How do I get page x of x without formatting it twice ?

	How do I get a Table of contents with the starting page of each sheet
	on it ?

--------

	Any help would be great,  even if you just have a "idea" please
	let me know, i'll try anything.


	Thanks in Advance.
-- 
	Jim Apple WB1DOG

	apple@nprdc.navy.mil
	...}ucsd!nprdc!apple

lee@anduk.co.uk (Liam R. Quin) (08/16/89)

I'm following this up because it is quite a common question...

In article <3285@seashore.nprdc.arpa> apple@nprdc.arpa (James Apple) says
that he needs to have his pages numbered as `page 1 of 8', where the
8 is computed dynamically.  He also has to do a table of contents:
>	Summary:
>
>	How do I get a Table of contents with the starting page of each sheet
>	on it ?
I'll do this one first because more people do tables of contents!

There are two strategies.

(1)	Use a diversion, accumulate the Table-Of-Contents (TOC) entries as
	you go, and print it out at the end

(2)	Write the entries to a file as you go, and then format the file as
	a separate run.

Method (1) is generally the best if you can use it.  There are problems if
(a) the final document is composed of multiple files, each produced by a
    separate run of troff (or nroff, of course)

(b) the table of contents has to appear in itself, e.g.
	1 table of contents
	2 introduction
    becasue this causes problems if the TOC is more than one page long.

If either (a) or (b) applies, you can still use this method, but you
have to combine it with method (2) in some way, and it's usually simplest
to use method (2) in that case.

Some Things To Read:

  Dale Dougherty & Tim O'Reilly: Unix Text Processing, Hayden books, 1987
  ISBN 0-672-46291-5
gives a good overview of this topic, especially in chapter 18.

  Alfred V. Aho, Brian W. Kernighan & Peter J. Weinberger:
  The Awk Programming Language, Addison-Wesley, 1988, ISBN 0-201-07981-X
gives examples of using AWK to prepare an index.
Both of these topics are covered in lots of other books on Unix or Text
processing.

A general caveat is that most of the troff books make some serious errors
when describing tabs and leaders, probably because Osanna's explanation,
whilst accurate, was a little terse.


So, here's method (1) in some detail.

If you do a single run of troff to print the entire document, you can use
a single diversion.  I'm typing this from memory, but the strategy is right.
	.de Te\" Table of contents entry
	. ev 2\" use a different environment so as not to affect the main text
	.  da _t\" keep the table of contents in a diversion called _t
	.  ta \\n(.luR\" (you need to set up the line length in ev 3)
	.  \" now the actual entry, to look like this:
	.  \" Care and feeding of directors ............................  8
	\&\\$1\ \a\ \\n%
	.  br
	.  da
	. ev
	..
if you want to have a contents line that looks like
1.3 Care and feeding of directors ...................................... 19
you need to use
	.ta x \\n(.lu
where x is the indent of `Care and...'.

The \a produces the row of dots, and there should be a fixed width space
either side of it, to stop the row of dots (`the leader') from bumping into
the title or the numbers and looking odd..

The tab is Right adjusted so that the page numbers all line up.
You can make the dots go further apart like this:
	.  fp 7 PI\" some font that has dots, but that you are not using
	.  cs 7 1m \" make the characters spread out in that font
	\&\\$1\ \f7\a\fP\ \\n%
	.  cs 7\" undo the cs effect

Now, in your end-macro, do
	.de Em \" end macro
	\&\c\" make an unfinished word, so troff will do another page
	' bp
	' af % i\" page footer numbers are now i, ii, etc.
	' nr % 1
	' \" do the table of contents header here
	' \" now replay the _t diversion:
	' sp
	' nf
	' ev 2
	' _t
	' ev
	..
	.em Em
You may need to do
	.di _t
	.di
before using .da _t.


Now, Method (2), in rather less detail.

If you are using lots of little files (e.g. one per section),
you will have to collect the information.
You can do this with .tm, being careful not to swallow any troff error
messages.  Also, the major commercial troff clones have the ability to
write to a file.  Proficient and eroff use redirection, for example
	.tm anything >> my-file
to append.  Sqtroff (just to be different) uses
	.xopen internal-name flags unix-name
and	.xwrite NAME anything
e.g.,
	.xopen toc wf table-of-contents
	.xwrite toc \\$1 \\$2 ..... (just like .tm)
Use af for the flags to append to the file instead of overwriting it.

After you have printed the entire document, you can end up with one or more
files containing lots of
	.Te 45 3.1 "Care and feeding of directors"
which you can merge if necessary) using sort, and feed through troff using
method (1), as above.


Now, if the contents page goes at the beginning of the document and is numbered
in sequence with the main body of the document... that is to say, if the
contents pages start on page 1 and finish on page 2, the next page is numbered
3, you may have to format everything twice in the case that the contents
page was longer than you expect.
Of course, since you still have the same number of sections, the contents page
won't change size this time, although the numbers on it will change.


Now to look at your other question.
>
>	How do I get page x of x without formatting it twice ?

You can use diversions to do this.  You may have problems on a PDP/11,
an 80286, or other 16 bit system, though.
Check that a number register can hold the height of 8 (say) pages:

	.nr a 8*29.7c \" assuming A4 paper
	.tm \na
	.nr a 29.7c
	.tm \na
and make sure that the numbers are all positive!
I get 56112 and 7015 on a 600dpi device for 8 pages, or for 16 pages on a
300dpi device.  If you get strange results, you will have to be careful
using the built-in registers \\n(dn, \\n(.t and \\n(.h as these may easily
overflow.


Having sorted that out, you should be able to store all your text in a
diversion.  You could use a separate diversion for the body of each page --
for example
	.di P\\n(s?
where s? holds the sheet number (the 3 in 3 of 8).
Then print the diversions (use recursion), generating the headers; at this
point you have all the information you need.
Hence, you would have page header and/or footer macros that simply
did
	.nr s? 0
	.
	.de (s\" start section
	. br\" stop earlier text falling into the diversion
	. nr s? +1\" started one more sheet
	. nr s1 \\n(s?\" remember the first number
	. di P\\n(s? \" where s? is the sheet number
	. dt \\n(XXu !p\" start a new "page' at XX units
	..
	.
	.de )s\" end of section
	. br\" remember to include the last line (may spring a trap)
	. di
	. _o \\n(s1 \\n(s?\" called to do the output, with pages s1...s?
	..
	.
	.de !p\" start a new page
	. di
	. nr s? +1
	. di P\\n(s?
	. dt \\n(XXu !p\" start a new "page' at XX units
	..
	.
	.de _o \" output routine
	. PAGE-HEADER\" or this could be a trap
	. P\\n(\\$1
	. PAGE-FOOTER\" or this could be a trap
	. bp\" start a new page for the next one
	. nr __ \\$1+1
	. if \\n(__<=\\$2 ._o \\n(__ \\$2\" print the next page (recursively)
	..
	.

If you don't follow all this, feel free to send some mail.

Of course, if you combine the TOC run with the page 1 of N run, you might be
able to do it all in two goes anyway...

You might also consider using makefiles, and having each section depend on
a file containing the page number of the previous section.
In your makefile, create a temporary file with the ending page number,
and do
	cmp tmpfile.$(SEC) lastpage.$(SEC) || cp tmpfile.$(SEC) lastpage.$(SEC)
so that the file upon which the next section depends will only be touched
if the last page changes.

Brian Kernighan and Rob Pike describe this in The Unix Programming
Environment; they use the technique with yacc and lex (in `hoc') rather
than with troff.  BWK also did it in his ditroff/pic, eqn, etc. makefiles.


Hope this helps.

Lee

>	Thanks in Advance.
>	Jim Apple WB1DOG
		  ^ I'm sorry, I don't understand if this is part of your name.
>	apple@nprdc.navy.mil
>	...}ucsd!nprdc!apple

--
Lee Russell Quin, Unixsys UK Ltd, The Genesis Centre, Birchwood, Warrington,
ENGLAND, WA3 7BH; +44 925 828181; JANET:  uk.ac.warwick.uu!anduk.co.uk!lee
lee%anduk.uucp@ai.toronto.edu |``All those against, raise your hands and say
{utzoo,uunet}!utai!anduk!lee  |		`I resign' ''

foessmei@lan.informatik.tu-muenchen.dbp.de (Reinhard Foessmeier) (08/18/89)

In article <33@nx32s.anduk.co.uk> lee@nx32s.UUCP (0000-Liam R. Quin) writes
   (about getting an alternative output channel from troff)
:
>You can do this with .tm, being careful not to swallow any troff error
>messages.  Also, the major commercial troff clones have the ability to
>write to a file.  

Uzante ditroff, vi povas ankaw          Using ditroff, you can also make use
utiligi la koment-sintakson;            of its comment facility; ditroff
ditroff-postproceziloj ignoras          postprocessors ignore lines starting
liniojn komencantajn per #.             with a #.  So you can include such
Do vi povas enkludi tiajn liniojn       lines in your ordinary output by
en la ordinaran eligon, per             saying

        \!#4 iru al kanalo 4 / to go to channel 4

kaj poste kolekti ilin per              and afterwards collect them by

        grep "^#4" troff.el | cut -c3- ...

Bone, tio kostas iom da tempo,          Alright, this will cost some time,
sed estas la tempo de la kom-           but it's the computer's time,
putilo, ne la mia...                    not mine... :-)

Reinhard F\"ossmeier, Technische Univ. M\"unchen |  Vivu
foessmeier@infovax.informatik.tu-muenchen.dbp.de |    la gefiloj
   [ { relay.cs.net | unido.uucp } ]             |       de niaj gepatroj!