dave@CITI.UMICH.EDU (Dave Bachmann) (11/24/89)
If you're like me, you don't have a Postscript printer at home, but you also don't want to wait until Monday to read the latest RFC, RFC1125. Well, you're in luck. I finally decided "The heck with having an RFC I can't read!" and I hacked up an awk script to decompile the Postscript in RFC 1125 into relatively readable text form. Of course, there were a few little details, like the back- to-front printing of the document, which meant I got page 18 before page 17, and this weird business of representing "ff" by \013, "fi" by \014 and so on. So, it's not the elegant script I had hoped for. But it works. The script is available for ftp on citi.umich.edu as pub/unps.awk. You'll also want the file cleanup.sed, which takes care of the \013 business, as well as parenthesis quoting. I'll also give these two at the end of this message, since they're so short. Warning: this is all very empirical, and full of magic numbers. To produce a useable file from rfc1125.ps, first do "awk -f unps.awk rfc1125.ps". This will produce the files "page18" through "page9" and then die complaining about only being able to write to 10 files. So now do "awk -f unps.awk limit=8 rfc1125.ps", which tells unps.awk to skip any pages > 8. It now has produced "page8" down to "page1". So now "cat page? page?? | sed -f cleanup.sed > rfc1125.txt" and you're done. I've also put the result in pub/rfc1125.txt for those who are impatient. After I had gotten this working I excitedly looked to see if it would work for the other Postscript RFC's. No such luck. EVERY AUTHOR OF A POSTSCRIPT RFC HAS USED A DIFFERENT PACKAGE. In fact, the only RFC's that share a common format are the NTP family. Oh well. Here they are: --------- unps.awk --------- # This script tries to decompile a Tek-produced Postscript document # and produce a file for each page. This is necessary to handle # documents that print back-to-front. Each page goes into a file # named "page<n>" where n is the page number. # There are a lot of magic numbers here. Trial and error. # # Track current page number # Specified as "<n> @bop1" where n is the new page number # $2 == "@bop1" { oline = 0 pagenum = $1 line = "" } # # Since awk can only write out to 10 files, we need a way to # skip the first n pages before starting to write to files. # To process only pages prior to page x, invoke with "limit=x" # { if (limit+0 > 0 && limit+0 < pagenum+0) next } # # Lines of the form "<n> r (<string>) s" are moving n points right # and writing string. I'm mapping a space to every 25 points, starting # at 5 and above. # Lines of the form "<n> r <m> c" are moving n points right and writing # the ascii character m. # $2 == "r" { dots = $1 while (dots > 5) { dots = dots - 25 line = line " " } if ($4 == "s") { token = $3 wordl = length(token) - 2 word = substr(token,2,wordl) line = line word } else line = line sprintf("%c", $3) } # # Lines of the form "<x> <y> p <stuff>" are positioning to coordinates # x,y on the page and doing something. If stuff ends in "ru" it's # drawing something, so ignore it. Otherwise find out how much the # y coordinate has changed and map that to newlines. I'm mapping a # line to every 48 points, starting at 30. This is where we print out # the previous line that we've been building. # $3 == "p" { if ($6 == "ru") next ldiff = $2 - oline oline = $2 while (ldiff > 29) { ldiff = ldiff - 48 print line > "page" pagenum line = "" } if ($5 == "s") { token = $4 wordl = length(token) - 2 word = substr(token,2,wordl) line = line word } if ($5 == "c") line = line sprintf("%c", $4) } # # Sometimes it just writes a string without positioning. # $2 == "s" { token = $1 wordl = length(token) - 2 word = substr(token,2,wordl) line = line word } # # Sometimes it just writes a character without positioning. # $2 == "c" { line = line sprintf("%c", $1) } # # End of the page. Print the previously built line, if any. # $1 == "@eop" {print line > "page" pagenum } # # That's all. --------- cleanup.sed --------- s/\\013/ff/g s/\\014/fi/g s/\\015/fl/g s/\\016/ffi/g s/\\(/(/g s/\\)/)/g --------- Dave Bachmann | dave@citi.umich.edu Center for Information Technology Integration | {mailrus,rutgers}!citi!dave University of Michigan | (313)998-7693 or 8-7479 P.S. Happy Thanksgiving
jgreely@oz.cis.ohio-state.edu (J Greely) (12/06/89)
In article <8911240620.AA06208@ucbvax.Berkeley.EDU> dave@CITI.UMICH.EDU (Dave Bachmann) writes: > After I had gotten this working I excitedly looked to see if it would work >for the other Postscript RFC's. No such luck. EVERY AUTHOR OF A POSTSCRIPT >RFC HAS USED A DIFFERENT PACKAGE. In fact, the only RFC's that share a common >format are the NTP family. Oh well. Ran a quick check of the four PostScript formatted RFCs I found here (1119, 1125, 1128, and 1129), and there are two macro packages in use, only one of which is worthwhile. 1125 uses pscat from Adobe's TranScript package to post-process troff output into PS (thank you!). The other (GEM-something-or-other) makes non-portable assumptions, mangles the Adobe Document Structuring Conventions, and simply won't print on all PS devices (guaranteed not to print on a NeXT, which is the only system that otherwise would allow it to be viewed on-screen). The EPS figures included look okay, but everything else is bogus. I would suggest that future PostScript-format RFCs be required to conform to the published conventions, or all hell will break loose when someone decides to use BrokenWord, whose output is printable only on a directly-attached Apple LaserWriter (note: I'm not picking on any particular WP package, but there are several that are almost that bad). Unfortunately, there's no PS validation tool, although some ideas are floating around comp.lang.postscript. Call me a purist, but if I can't print it page-reversed, double-sided, two-up, and in signature order, it ain't PostScript. (incidentally, this is the most convenient form I've found for carrying RFCs around; try it, you'll like it) -=- J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)
henry@utzoo.uucp (Henry Spencer) (12/07/89)
In article <JGREELY.89Dec5181242@oz.cis.ohio-state.edu> J Greely <jgreely@cis.ohio-state.edu> writes: >Ran a quick check of the four PostScript formatted RFCs I found here >(1119, 1125, 1128, and 1129), and there are two macro packages in use, >only one of which is worthwhile. 1125 uses pscat from Adobe's >TranScript package to post-process troff output into PS... Hmm. This means, of course, that except for illustrations (haven't looked at 1125 myself), it would be trivial to supply an ASCII-text version of 1125 -- just run through nroff instead of troff. It would Sure Be Nice to have a greppable version... -- 1233 EST, Dec 7, 1972: | Henry Spencer at U of Toronto Zoology last ship sails for the Moon. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
jgreely@scarecrow.cis.ohio-state.edu (J Greely) (12/07/89)
In article <1989Dec6.173258.1036@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >Hmm. This means, of course, that except for illustrations (haven't looked >at 1125 myself), it would be trivial to supply an ASCII-text version of >1125 -- just run through nroff instead of troff. It would Sure Be Nice >to have a greppable version... Sigh. Correct thought, wrong RFC. I hadn't realized until now that we had miscopied 1124 as 1125 here. 1124 is the "troff | pscat" output, 1125 is "TeX | dvi2ps", where dvi2ps is an old, ugly, non-conforming dvi converter. Take all of my negative comments about the other PS RFCs, and apply them to 1125. 1124 is, however, fine, although running it through nroff would still be useful for many people. The loss of the illustrations may hurt it (I haven't read the text, just the PostScript; I'm not near a printer right now!), but I'd consider the increased convenience worth it. Actually, the same thought applies to TeX. Dvidoc produces reasonably formatted ASCII text from TeX documents, and it's widely available. -=- J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)
Mills@UDEL.EDU (12/07/89)
J, While I don't stick up for the GEM folk, who supplied the windowing environment for Xerox Ventura Publisher, which was used to prepare RFC-1119/-1128/-1129, I must admit that I had to munge the PostScript output file to make what appears as two PostScript documents as only one by deleting the preamble to the second document. The problem arises because some CAP packages, Ventura among them, find it easiest to produce tables of contents as a separate document and combine them during the printing process. Now, you could bum Xerox for such a rash assumption or bum the Unix spoolers that don't like two documents in one envelope or bum me for fumbling the combining process. Life goes on. Dave
rdroms@NRI.RESTON.VA.US (12/08/89)
I've written two .sty files that might be of interest to this discussion. The first, rfc.sty, generates RFC-style output (title page, headers, footers, etc.) from LaTeX. The second, txt.sty, generates a .dvi file that can be run through dvi2tty to produce well-formatted (IMHO, better than stock dvi2tty or dvidoc) ASCII output. I generated the PostScript and ASCII versions of the Dynamic Host Configuration Internet Draft using these .sty files. At present, txt.sty still needs more work, primarily to track down and eliminate all the rubber vertical glue. Dvi2tty could also use some work to improve spacing of characters in both dimensions. For example, horizontal and vertical bars (actually, rules in general) are not handled well. Is there general interest in these .sty files? How many RFCs or other documents might actually be produced in both PostScript and ASCII from TeX? I'd like to know if it's worth my time to put more effort into fine-tuning these tools. - Ralph Droms (On leave from Bucknell University) NRI rdroms@nri.reston.va.us 1895 Preston White Drive, Suite 100 (703) 620-8990 Reston, VA 22091 (703) 620-0913 (fax)