[comp.protocols.tcp-ip] Decrypting RFC 1125

dave@CITI.UMICH.EDU (Dave Bachmann) (11/24/89)

If you're like me, you don't have a Postscript printer at home, but you also
don't want to wait until Monday to read the latest RFC, RFC1125.  Well, you're
in luck.  I finally decided "The heck with having an RFC I can't read!" and I
hacked up an awk script to decompile the Postscript in RFC 1125 into relatively
readable text form.  Of course, there were a few little details, like the back-
to-front printing of the document, which meant I got page 18 before page 17, 
and this weird business of representing "ff" by \013, "fi" by \014 and so on.
So, it's not the elegant script I had hoped for. But it works.  The script is
available for ftp on citi.umich.edu as pub/unps.awk.  You'll also want the file
cleanup.sed, which takes care of the \013 business, as well as parenthesis
quoting.  I'll also give these two at the end of this message, since they're
so short.  Warning: this is all very empirical, and full of magic numbers.  To
produce a useable file from rfc1125.ps, first do "awk -f unps.awk rfc1125.ps".
This will produce the files "page18" through "page9" and then die complaining
about only being able to write to 10 files.  So now do "awk -f unps.awk limit=8
rfc1125.ps", which tells unps.awk to skip any pages > 8. It now has produced
"page8" down to "page1". So now "cat page? page?? | sed -f cleanup.sed > 
rfc1125.txt" and you're done.  I've also put the result in pub/rfc1125.txt for
those who are impatient.
  After I had gotten this working I excitedly looked to see if it would work
for the other Postscript RFC's.  No such luck.  EVERY AUTHOR OF A POSTSCRIPT
RFC HAS USED A DIFFERENT PACKAGE.  In fact, the only RFC's that share a common
format are the NTP family.  Oh well.
  Here they are:
---------
unps.awk
---------
#	This script tries to decompile a Tek-produced Postscript document
#	and produce a file for each page.  This is necessary to handle
#	documents that print back-to-front.  Each page goes into a file
#	named "page<n>" where n is the page number.
#	There are a lot of magic numbers here.  Trial and error.
#
#	Track current page number
#	Specified as "<n> @bop1" where n is the new page number
#
$2 == "@bop1" { oline = 0
                pagenum = $1
                line = "" }
#
#	Since awk can only write out to 10 files, we need a way to
#	skip the first n pages before starting to write to files.
#	To process only pages prior to page x, invoke with "limit=x"
#
{ if (limit+0 > 0 && limit+0 < pagenum+0) next }
#
#	Lines of the form "<n> r (<string>) s" are moving n points right
#	and writing string.  I'm mapping a space to every 25 points, starting
#	at 5 and above.
#	Lines of the form "<n> r <m> c" are moving n points right and writing
#	the ascii character m.
#
$2 == "r" { dots = $1
            while (dots > 5) { dots = dots - 25
                              line = line " " }
	    if ($4 == "s") { token = $3
			     wordl = length(token) - 2
			     word = substr(token,2,wordl)
			     line = line word }
	    else line = line sprintf("%c", $3) }
#
#	Lines of the form "<x> <y> p <stuff>" are positioning to coordinates
#	x,y on the page and doing something.  If stuff ends in "ru" it's
#	drawing something, so ignore it.  Otherwise find out how much the
#	y coordinate has changed and map that to newlines.  I'm mapping a
#	line to every 48 points, starting at 30.  This is where we print out
#	the previous line that we've been building.
#
$3 == "p" { if ($6 == "ru") next
            ldiff = $2 - oline
            oline = $2
            while (ldiff > 29) { ldiff = ldiff - 48
                                 print line > "page" pagenum
                                 line = "" }
            if ($5 == "s") { token = $4
	                     wordl = length(token) - 2
	                     word = substr(token,2,wordl) 
	                     line = line word }
            if ($5 == "c") line = line sprintf("%c", $4) }
#
#	Sometimes it just writes a string without positioning.
#
$2 == "s" { token = $1
            wordl = length(token) - 2
            word = substr(token,2,wordl)
            line = line word }
#
#	Sometimes it just writes a character without positioning.
#
$2 == "c" { line = line sprintf("%c", $1) }
#
#	End of the page.  Print the previously built line, if any.
#
$1 == "@eop" {print line > "page" pagenum }
#
#	That's all.
---------
cleanup.sed
---------
s/\\013/ff/g
s/\\014/fi/g
s/\\015/fl/g
s/\\016/ffi/g
s/\\(/(/g
s/\\)/)/g
---------

Dave Bachmann                                   |  dave@citi.umich.edu
Center for Information Technology Integration   |  {mailrus,rutgers}!citi!dave
University of Michigan                          |  (313)998-7693 or 8-7479

P.S.  Happy Thanksgiving

jgreely@oz.cis.ohio-state.edu (J Greely) (12/06/89)

In article <8911240620.AA06208@ucbvax.Berkeley.EDU> dave@CITI.UMICH.EDU
 (Dave Bachmann) writes:
>  After I had gotten this working I excitedly looked to see if it would work
>for the other Postscript RFC's.  No such luck.  EVERY AUTHOR OF A POSTSCRIPT
>RFC HAS USED A DIFFERENT PACKAGE.  In fact, the only RFC's that share a common
>format are the NTP family.  Oh well.

Ran a quick check of the four PostScript formatted RFCs I found here
(1119, 1125, 1128, and 1129), and there are two macro packages in use,
only one of which is worthwhile.  1125 uses pscat from Adobe's
TranScript package to post-process troff output into PS (thank you!).
The other (GEM-something-or-other) makes non-portable assumptions,
mangles the Adobe Document Structuring Conventions, and simply won't
print on all PS devices (guaranteed not to print on a NeXT, which is
the only system that otherwise would allow it to be viewed on-screen).
The EPS figures included look okay, but everything else is bogus.

  I would suggest that future PostScript-format RFCs be required to
conform to the published conventions, or all hell will break loose
when someone decides to use BrokenWord, whose output is printable only
on a directly-attached Apple LaserWriter (note: I'm not picking on any
particular WP package, but there are several that are almost that
bad).  Unfortunately, there's no PS validation tool, although some
ideas are floating around comp.lang.postscript.

  Call me a purist, but if I can't print it page-reversed,
double-sided, two-up, and in signature order, it ain't PostScript.
(incidentally, this is the most convenient form I've found for
carrying RFCs around; try it, you'll like it)
-=-
J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)

henry@utzoo.uucp (Henry Spencer) (12/07/89)

In article <JGREELY.89Dec5181242@oz.cis.ohio-state.edu> J Greely <jgreely@cis.ohio-state.edu> writes:
>Ran a quick check of the four PostScript formatted RFCs I found here
>(1119, 1125, 1128, and 1129), and there are two macro packages in use,
>only one of which is worthwhile.  1125 uses pscat from Adobe's
>TranScript package to post-process troff output into PS...

Hmm.  This means, of course, that except for illustrations (haven't looked
at 1125 myself), it would be trivial to supply an ASCII-text version of
1125 -- just run through nroff instead of troff.  It would Sure Be Nice
to have a greppable version...
-- 
1233 EST, Dec 7, 1972:         |     Henry Spencer at U of Toronto Zoology
last ship sails for the Moon.  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

jgreely@scarecrow.cis.ohio-state.edu (J Greely) (12/07/89)

In article <1989Dec6.173258.1036@utzoo.uucp> henry@utzoo.uucp
 (Henry Spencer) writes:
>Hmm.  This means, of course, that except for illustrations (haven't looked
>at 1125 myself), it would be trivial to supply an ASCII-text version of
>1125 -- just run through nroff instead of troff.  It would Sure Be Nice
>to have a greppable version...

Sigh.  Correct thought, wrong RFC.  I hadn't realized until now that
we had miscopied 1124 as 1125 here.  1124 is the "troff | pscat"
output, 1125 is "TeX | dvi2ps", where dvi2ps is an old, ugly,
non-conforming dvi converter.  Take all of my negative comments about
the other PS RFCs, and apply them to 1125.  1124 is, however, fine,
although running it through nroff would still be useful for many
people.  The loss of the illustrations may hurt it (I haven't read the
text, just the PostScript; I'm not near a printer right now!), but I'd
consider the increased convenience worth it.

  Actually, the same thought applies to TeX.  Dvidoc produces
reasonably formatted ASCII text from TeX documents, and it's widely
available.
-=-
J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)

Mills@UDEL.EDU (12/07/89)

J,

While I don't stick up for the GEM folk, who supplied the windowing
environment for Xerox Ventura Publisher, which was used to prepare
RFC-1119/-1128/-1129, I must admit that I had to munge the PostScript
output file to make what appears as two PostScript documents as only
one by deleting the preamble to the second document. The problem arises
because some CAP packages, Ventura among them, find it easiest to produce
tables of contents as a separate document and combine them during the
printing process. Now, you could bum Xerox for such a rash assumption
or bum the Unix spoolers that don't like two documents in one envelope
or bum me for fumbling the combining process. Life goes on.

Dave

rdroms@NRI.RESTON.VA.US (12/08/89)

I've written two .sty files that might be of interest to this
discussion.  The first, rfc.sty, generates RFC-style output (title
page, headers, footers, etc.) from LaTeX.  The second, txt.sty,
generates a .dvi file that can be run through dvi2tty to produce
well-formatted (IMHO, better than stock dvi2tty or dvidoc) ASCII
output.  I generated the PostScript and ASCII versions of the Dynamic
Host Configuration Internet Draft using these .sty files.

At present, txt.sty still needs more work, primarily to track down and
eliminate all the rubber vertical glue.  Dvi2tty could also use some
work to improve spacing of characters in both dimensions.  For
example, horizontal and vertical bars (actually, rules in general) are
not handled well.

Is there general interest in these .sty files?  How many RFCs or other
documents might actually be produced in both PostScript and ASCII from
TeX?  I'd like to know if it's worth my time to put more effort into
fine-tuning these tools.

- Ralph Droms                             (On leave from Bucknell University)
  NRI                                     rdroms@nri.reston.va.us
  1895 Preston White Drive, Suite 100     (703) 620-8990
  Reston, VA 22091                        (703) 620-0913 (fax)