glenn@heaven.woodside.ca.us (Glenn Reid) (06/27/90)
In article <6750@umd5.umd.edu> zben@umd5.umd.edu (Ben Cranston) writes: >This is really neat, because the shell script logically *removed* the >standard Adobe Illustrator header, and now Glenn is *adding* a header >to make it even smaller. > >My experience with the shell script is that it makes small files smaller >and large files larger. This is not so counterintuitive when you realize >that using a header instead of bare PS code is a trade-off. What you really want the shell script to do is not to expand the procedure calls back into raw PostScript, but to eliminate procedure definitions from the prologue that are not called at all. Or, replace them with equivalently useful ones like I did as a post-process to your output file. In other words, keep track of which procedures are called (like "c" or "f" or whatever) and rather than expanding them, just make sure they are in the prologue (but none of the other stuff is). The notion of prologue definitions without any script to call them reaches its most absurd when you do a command-K dump of a Macintosh print file or when you create an Adobe Illustrator 88 file with one line or one character in it (in the Illustrator case, you get all the color separation stuff and all the procedure definitions; something like 12800 bytes of unused definitions). The main purpose of defining prologue procedures is so that you can represent the document in less space, and so it can execute more efficiently. But this effect is nullified (and in fact reversed) when the document is short. Just observations, of course, not value judgements. Although applications could keep track of what was drawn and write out a different prologue accordingly, that's not very practical all of the time. But then, neither is a library of, say, 100 Illustrator drawings where 1.3 megabytes of the disk space used is the same prologue saved over and over again, in which large parts of the prologue are never used at all. (Glenn) cvn -- % Glenn Reid PostScript consultant % glenn@heaven.woodside.ca.us Free Estimates % ..{adobe,next}!heaven!glenn Unparalleled Quality
zben@umd5.umd.edu (Ben Cranston) (06/28/90)
In article <186@heaven.woodside.ca.us> glenn@heaven.UUCP (Glenn Reid) writes: > What you really want the shell script to do is not to expand the > procedure calls back into raw PostScript, but to eliminate procedure > definitions from the prologue that are not called at all. Although > applications could keep track of what was drawn and write out a different > prologue accordingly, that's not very practical all of the time. I could see initally writing an application to keep track of which parts of the prolog were actually used, but if you let the prolog procedures call each other it could be a maintenance nightmare keeping the caller/callee matrix updated. On the other hand, this is EXACTLY the problem most linkage-editors deal with every day and by and large this is a known technology. Sure is true that there is nothing really new under the sun... If the Illustrator prolog is always exactly the same, why bother to write it out into the save file at all? I assume it is utterly ignored when a file is read in. Then provide a separate program or a menu function to write "a *complete* postscript file", including the prolog. BTW, is it really true that you can save execution time by using prolog procedures rather than the "raw" lineto, curveto, etc primitives? Seems to me there's always got to be SOME intepreter overhead hit, and since the numbers always change you're not saving string-to-num time. Can it really cost so much to lookup "lineto"? Actually, assuming a one-for-one l == lineto c == curveto then there is exactly one lookup in either case. So is there any runtime saving, or are you in fact trading runtime away for space with this simple-minded prolog? Just random thoughts -- PostScript sure is a neat toy! -- Ben Cranston <zben@umd2.umd.edu> Warm and Fuzzy Networking Group, Egregious State University My cat is named "Perpetually Hungry Autonomous Carbon Unit"; I call him "Sam".
glenn@heaven.woodside.ca.us (Glenn Reid) (06/29/90)
In article <6761@umd5.umd.edu> zben@umd5.umd.edu (Ben Cranston) writes: >If the Illustrator prolog is always exactly the same, why bother to write >it out into the save file at all? I assume it is utterly ignored when a >file is read in. Then provide a separate program or a menu function to >write "a *complete* postscript file", including the prolog. Sounds like a great idea to me. I've done it myself as a post-process, with shell scripts, when I have lots of illustrations included in one larger document (FrameMaker). >BTW, is it really true that you can save execution time by using prolog >procedures rather than the "raw" lineto, curveto, etc primitives? Seems to >me there's always got to be SOME intepreter overhead hit, and since the >numbers always change you're not saving string-to-num time. Can it really >cost so much to lookup "lineto"? Depends on whether or not they're really procedures; see below. >Actually, assuming a one-for-one l == lineto c == curveto then there is >exactly one lookup in either case. So is there any runtime saving, or are >you in fact trading runtime away for space with this simple-minded prolog? When you do something like "/l /lineto load def" you set up the world so that two names both point to the same operator: "l" and "lineto". These take exactly the same amount of time to look up and execute, as you mention, but the "l" takes 1/5 the space and 1/5 the transmission time to get to the interpreter, and the scanner only has to read 1/5 as many characters to read the name. So you save execution time, but on the front end, not inside the interpreter itself. If you define a procedure like "/l { lineto } def" you actually LOSE, because you incur two name lookups (once for "l" and once for "lineto" inside the procedure) plus the procedure call overhead. Using "bind" helps a little bit. But if the procedure is longer, and you use "bind": /f { gsave 1 setgray fill grestore stroke } bind def it gets a litte more complicated. You lose some on procedure overhead, you gain lots in space savings and transmission time, and you save some name lookup time since you used "bind" on the procedure. All in all, this is a WIN and is why the procedure mechanism exists to begin with. When I looked through the recycle code, I noticed recurring patterns of instructions and data which were `fixed'; perfect candidates to be captured in a procedure. Otherwise I just shortened the names of the operators you called, saving a bit of space and transmission time, and therefore making it execute a little bit faster. /Glenn -- % Glenn Reid PostScript consultant % glenn@heaven.woodside.ca.us Free Estimates % ..{adobe,next}!heaven!glenn Unparalleled Quality
zben@umd5.umd.edu (Ben Cranston) (06/30/90)
Along the lines of shorter names, one of the people using the A.I. to
Vanilla shell script noted it errored with a divide by zero when processing
vertical text (yeah, the old arctan problem -- a physicist added lasers to
our spacewar program in 1972, guess what happened when you fired your laser
straight up...) Anyway, I tried to put in the fix and the awk call started
complaining about "arguments too long".
The structure is some preliminary options stuff then the construct:
awk 'BEGIN {
<7 pages of a pretty complicated awk program>
}' $*
and the additions pushed the size of the program over 8192 characters and
evidently Unix has that limit on the size of "program arguments".
Given the choice of purging the comments, throwing away the indentation,
or shortening the variable names, I decided:
s/currgray/cgr/g - Current gray value on output channel
s/fillg/fgr/g - Current "fill" gray value input channel
s/strokeg/sgr/g - Current "stroke" gray value input channel
This made just about enough room for one more case in the code that reads a
"matrix" and decomposes it back into a "translation", "rotation", and "scale".
So here's another reason for using shorter names...
--
Ben Cranston <zben@umd2.umd.edu>
Warm and Fuzzy Networking Group, Egregious State University
My cat is named "Perpetually Hungry Autonomous Carbon Unit"; I call him "Sam".
glenn@heaven.woodside.ca.us (Glenn Reid) (07/02/90)
In article <6785@umd5.umd.edu> zben@umd5.umd.edu (Ben Cranston) writes: > Anyway, I tried to put in the fix and the awk call started > complaining about "arguments too long". > > The structure is some preliminary options stuff then the construct: > > awk 'BEGIN { > <7 pages of a pretty complicated awk program> > }' $* > > and the additions pushed the size of the program over 8192 characters and > evidently Unix has that limit on the size of "program arguments". > > Given the choice of purging the comments, throwing away the indentation, > or shortening the variable names, I decided: This is of course getting off the beaten path of PostScript, but heck, we all love to write shells scripts, too. And there's always something cosmic when a principle (like shortening names) applies in such different circumstances. Anyway, what I tend to do in this situation is to make a temporary file with the awk script in it, so you don't care how long it is. Something along these lines: #!/bin/sh cat > /tmp/awk.$$ << 'END_OF_SCRIPT' BEGIN { <7 pages of a pretty complicated awk program> } END_OF_SCRIPT awk -f /tmp/awk.$$ $* But this method has its complications, too, of course. I just thought I'd toss in my $.02.... Cheers, Glenn -- % Glenn Reid PostScript/NeXT consultant % glenn@heaven.woodside.ca.us One-day turnaround on many projects % ..{adobe,next}!heaven!glenn Unparalleled Quality
jimc@isc-br.ISC-BR.COM (Jim Cathey) (07/03/90)
In article <6785@umd5.umd.edu> zben@umd5.umd.edu (Ben Cranston) writes: >awk 'BEGIN { ><7 pages of a pretty complicated awk program> >}' $* > >and the additions pushed the size of the program over 8192 characters and >evidently Unix has that limit on the size of "program arguments". > >Given the choice of purging the comments, throwing away the indentation, >or shortening the variable names, I decided: What I've seen done is to use sed to chew up the nice human-readable format... FIND=`sed -e "s/ //" -e "s/ *#.*//" <<'EOT' gnarly indented awk program here # with comments like this.. EOT ` awk "$FIND" arguments You can even stick in some -e "s/longvariable/lv" substitutions to trim it down even more. (That first substitution deletes tab characters, by the way.) +----------------+ ! II CCCCCC ! Jim Cathey ! II SSSSCC ! ISC-Bunker Ramo ! II CC ! TAF-C8; Spokane, WA 99220 ! IISSSS CC ! UUCP: uunet!isc-br!jimc (jimc@isc-br.iscs.com) ! II CCCCCC ! (509) 927-5757 +----------------+ "With excitement like this, who is needing enemas?"
john@trigraph.uucp (John Chew) (07/04/90)
In <6761@umd5.umd.edu> zben@umd5.umd.edu (Ben Cranston) writes: >I could see initally writing an application to keep track of which parts >of the prolog were actually used, but if you let the prolog procedures >call each other it could be a maintenance nightmare keeping the >caller/callee matrix updated. No nightmare. I wrote a Perl script to do this a few months ago. We keep several hundred logos online that have been created by Adobe Illustrator or Adobe Streamline, and several hundred prologues add up to a lot of disk space. I went through the standard prologue by hand once and built up a dependency tree. The Perl script uses an associative array to keep track of which tokens have been defined, and spits out necessary definitions as needed. Yes, it messes up the prologue/script dichotomy for the sake of easy programming, but not doing so is left as an exercise to the reader. John -- john j. chew, iii phone: +1 416 425 3818 AppleLink: CDA0329 trigraph, inc., toronto, canada {uunet!utai!utcsri,utgpu,utzoo}!trigraph!john dept. of math., u. of toronto poslfit@{utorgpu.bitnet,gpu.utcs.utoronto.ca}