[comp.lang.postscript] Shorter version of PostScript "Recycle" symbol

glenn@heaven.woodside.ca.us (Glenn Reid) (06/27/90)

In article <6750@umd5.umd.edu> zben@umd5.umd.edu (Ben Cranston) writes:
>This is really neat, because the shell script logically *removed* the
>standard Adobe Illustrator header, and now Glenn is *adding* a header
>to make it even smaller.
>
>My experience with the shell script is that it makes small files smaller
>and large files larger.  This is not so counterintuitive when you realize
>that using a header instead of bare PS code is a trade-off.

What you really want the shell script to do is not to expand the
procedure calls back into raw PostScript, but to eliminate procedure
definitions from the prologue that are not called at all.  Or, replace
them with equivalently useful ones like I did as a post-process to your
output file.  In other words, keep track of which procedures are called
(like "c" or "f" or whatever) and rather than expanding them, just make
sure they are in the prologue (but none of the other stuff is).

The notion of prologue definitions without any script to call them
reaches its most absurd when you do a command-K dump of a Macintosh
print file or when you create an Adobe Illustrator 88 file with one
line or one character in it (in the Illustrator case, you get all
the color separation stuff and all the procedure definitions; something
like 12800 bytes of unused definitions).

The main purpose of defining prologue procedures is so that you can
represent the document in less space, and so it can execute more
efficiently.  But this effect is nullified (and in fact reversed)
when the document is short.

Just observations, of course, not value judgements.  Although applications
could keep track of what was drawn and write out a different prologue
accordingly, that's not very practical all of the time.  But then, neither
is a library of, say, 100 Illustrator drawings where 1.3 megabytes of
the disk space used is the same prologue saved over and over again, in
which large parts of the prologue are never used at all.

(Glenn) cvn

-- 
% Glenn Reid				PostScript consultant
% glenn@heaven.woodside.ca.us		Free Estimates
% ..{adobe,next}!heaven!glenn		Unparalleled Quality

zben@umd5.umd.edu (Ben Cranston) (06/28/90)

In article <186@heaven.woodside.ca.us> glenn@heaven.UUCP (Glenn Reid) writes:

> What you really want the shell script to do is not to expand the
> procedure calls back into raw PostScript, but to eliminate procedure
> definitions from the prologue that are not called at all.  Although
> applications could keep track of what was drawn and write out a different
> prologue accordingly, that's not very practical all of the time.

I could see initally writing an application to keep track of which parts
of the prolog were actually used, but if you let the prolog procedures
call each other it could be a maintenance nightmare keeping the
caller/callee matrix updated.

On the other hand, this is EXACTLY the problem most linkage-editors deal
with every day and by and large this is a known technology.  Sure is true
that there is nothing really new under the sun...

If the Illustrator prolog is always exactly the same, why bother to write
it out into the save file at all?  I assume it is utterly ignored when a
file is read in.  Then provide a separate program or a menu function to
write "a *complete* postscript file", including the prolog.

BTW, is it really true that you can save execution time by using prolog
procedures rather than the "raw" lineto, curveto, etc primitives?  Seems to
me there's always got to be SOME intepreter overhead hit, and since the
numbers always change you're not saving string-to-num time.  Can it really
cost so much to lookup "lineto"?

Actually, assuming a one-for-one l == lineto c == curveto then there is
exactly one lookup in either case.  So is there any runtime saving, or are
you in fact trading runtime away for space with this simple-minded prolog?

Just random thoughts -- PostScript sure is a neat toy!

-- 

Ben Cranston <zben@umd2.umd.edu>
Warm and Fuzzy Networking Group, Egregious State University
My cat is named "Perpetually Hungry Autonomous Carbon Unit"; I call him "Sam".

glenn@heaven.woodside.ca.us (Glenn Reid) (06/29/90)

In article <6761@umd5.umd.edu> zben@umd5.umd.edu (Ben Cranston) writes:
>If the Illustrator prolog is always exactly the same, why bother to write
>it out into the save file at all?  I assume it is utterly ignored when a
>file is read in.  Then provide a separate program or a menu function to
>write "a *complete* postscript file", including the prolog.

Sounds like a great idea to me.  I've done it myself as a post-process, with
shell scripts, when I have lots of illustrations included in one larger
document (FrameMaker).

>BTW, is it really true that you can save execution time by using prolog
>procedures rather than the "raw" lineto, curveto, etc primitives?  Seems to
>me there's always got to be SOME intepreter overhead hit, and since the
>numbers always change you're not saving string-to-num time.  Can it really
>cost so much to lookup "lineto"?

Depends on whether or not they're really procedures; see below.

>Actually, assuming a one-for-one l == lineto c == curveto then there is
>exactly one lookup in either case.  So is there any runtime saving, or are
>you in fact trading runtime away for space with this simple-minded prolog?

When you do something like "/l /lineto load def" you set up the world so
that two names both point to the same operator:  "l" and "lineto".  These
take exactly the same amount of time to look up and execute, as you
mention, but the "l" takes 1/5 the space and 1/5 the transmission time
to get to the interpreter, and the scanner only has to read 1/5 as many
characters to read the name.  So you save execution time, but on the
front end, not inside the interpreter itself.

If you define a procedure like "/l { lineto } def" you actually LOSE,
because you incur two name lookups (once for "l" and once for "lineto"
inside the procedure) plus the procedure call overhead.  Using "bind"
helps a little bit.

But if the procedure is longer, and you use "bind":

	/f { gsave 1 setgray fill grestore stroke } bind def

it gets a litte more complicated.  You lose some on procedure overhead,
you gain lots in space savings and transmission time, and you save some
name lookup time since you used "bind" on the procedure.  All in all,
this is a WIN and is why the procedure mechanism exists to begin with.

When I looked through the recycle code, I noticed recurring patterns
of instructions and data which were `fixed'; perfect candidates to be
captured in a procedure.  Otherwise I just shortened the names of the
operators you called, saving a bit of space and transmission time, and
therefore making it execute a little bit faster.

/Glenn

-- 
% Glenn Reid				PostScript consultant
% glenn@heaven.woodside.ca.us		Free Estimates
% ..{adobe,next}!heaven!glenn		Unparalleled Quality

zben@umd5.umd.edu (Ben Cranston) (06/30/90)

Along the lines of shorter names, one of the people using the A.I. to
Vanilla shell script noted it errored with a divide by zero when processing
vertical text (yeah, the old arctan problem -- a physicist added lasers to
our spacewar program in 1972, guess what happened when you fired your laser
straight up...)  Anyway, I tried to put in the fix and the awk call started
complaining about "arguments too long".

The structure is some preliminary options stuff then the construct:

awk 'BEGIN {
<7 pages of a pretty complicated awk program>
}' $*

and the additions pushed the size of the program over 8192 characters and
evidently Unix has that limit on the size of "program arguments".

Given the choice of purging the comments, throwing away the indentation,
or shortening the variable names, I decided:

s/currgray/cgr/g           - Current gray value on output channel
s/fillg/fgr/g              - Current "fill" gray value input channel
s/strokeg/sgr/g            - Current "stroke" gray value input channel

This made just about enough room for one more case in the code that reads a
"matrix" and decomposes it back into a "translation", "rotation", and "scale".

So here's another reason for using shorter names...

-- 

Ben Cranston <zben@umd2.umd.edu>
Warm and Fuzzy Networking Group, Egregious State University
My cat is named "Perpetually Hungry Autonomous Carbon Unit"; I call him "Sam".

glenn@heaven.woodside.ca.us (Glenn Reid) (07/02/90)

In article <6785@umd5.umd.edu> zben@umd5.umd.edu (Ben Cranston) writes:
>               Anyway, I tried to put in the fix and the awk call started
> complaining about "arguments too long".
>
> The structure is some preliminary options stuff then the construct:
>
> awk 'BEGIN {
> <7 pages of a pretty complicated awk program>
> }' $*
>
> and the additions pushed the size of the program over 8192 characters and
> evidently Unix has that limit on the size of "program arguments".
>
> Given the choice of purging the comments, throwing away the indentation,
> or shortening the variable names, I decided:

This is of course getting off the beaten path of PostScript, but heck,
we all love to write shells scripts, too.  And there's always something
cosmic when a principle (like shortening names) applies in such different
circumstances.

Anyway, what I tend to do in this situation is to make a temporary file
with the awk script in it, so you don't care how long it is.  Something
along these lines:

    #!/bin/sh
    cat > /tmp/awk.$$ << 'END_OF_SCRIPT'
    BEGIN {
      <7 pages of a pretty complicated awk program>
    }
    END_OF_SCRIPT
    awk -f /tmp/awk.$$ $*

But this method has its complications, too, of course.  I just thought I'd
toss in my $.02....

Cheers,
 Glenn
-- 
% Glenn Reid				PostScript/NeXT consultant
% glenn@heaven.woodside.ca.us		One-day turnaround on many projects
% ..{adobe,next}!heaven!glenn		Unparalleled Quality

jimc@isc-br.ISC-BR.COM (Jim Cathey) (07/03/90)

In article <6785@umd5.umd.edu> zben@umd5.umd.edu (Ben Cranston) writes:
>awk 'BEGIN {
><7 pages of a pretty complicated awk program>
>}' $*
>
>and the additions pushed the size of the program over 8192 characters and
>evidently Unix has that limit on the size of "program arguments".
>
>Given the choice of purging the comments, throwing away the indentation,
>or shortening the variable names, I decided:

What I've seen done is to use sed to chew up the nice human-readable format...

FIND=`sed -e "s/	//" -e "s/ *#.*//" <<'EOT'
	gnarly indented awk program here 	# with comments like this..
EOT
`
awk "$FIND" arguments

You can even stick in some -e "s/longvariable/lv" substitutions to trim it
down even more.  (That first substitution deletes tab characters, by the way.)

+----------------+
! II      CCCCCC !  Jim Cathey
! II  SSSSCC     !  ISC-Bunker Ramo
! II      CC     !  TAF-C8;  Spokane, WA  99220
! IISSSS  CC     !  UUCP: uunet!isc-br!jimc (jimc@isc-br.iscs.com)
! II      CCCCCC !  (509) 927-5757
+----------------+
			"With excitement like this, who is needing enemas?"

john@trigraph.uucp (John Chew) (07/04/90)

In <6761@umd5.umd.edu> zben@umd5.umd.edu (Ben Cranston) writes:
>I could see initally writing an application to keep track of which parts
>of the prolog were actually used, but if you let the prolog procedures
>call each other it could be a maintenance nightmare keeping the
>caller/callee matrix updated.

No nightmare.  I wrote a Perl script to do this a few months ago.  We keep 
several hundred logos online that have been created by Adobe Illustrator or 
Adobe Streamline, and several hundred prologues add up to a lot of disk space. 
I went through the standard prologue by hand once and built up a dependency 
tree.  The Perl script uses an associative array to keep track of which tokens 
have been defined, and spits out necessary definitions as needed.  Yes, it 
messes up the prologue/script dichotomy for the sake of easy programming, 
but not doing so is left as an exercise to the reader.

John
-- 
john j. chew, iii   		  phone: +1 416 425 3818     AppleLink: CDA0329
trigraph, inc., toronto, canada   {uunet!utai!utcsri,utgpu,utzoo}!trigraph!john
dept. of math., u. of toronto     poslfit@{utorgpu.bitnet,gpu.utcs.utoronto.ca}