[comp.lang.forth] Strippers

schmidtg@iccgcc.decnet.ab.com (02/08/91)

Sometime ago in a posting regarding metacompilation, a passing reference
was made to a "stripper" program.  A stripper removes all words not used
within a given application.  Has anybody developed/used such a program?
If so, I would be interested in learning more about it.  Here are some of
my own thoughts/questions about how a stripper might work.


	Method 1.

	The stripper works in conjunction with the metacompiler.  The
	metacompiler is modified to keep a reference count for each
	defined word in the symbol table.  After metacompilation, if
	the reference count is zero, the word is defined, but not
	referenced and may therefore be eliminated in the subsequent
	metacompilation.  Now comes the fun part.  The metacompilation
	is rerun and these words are ignored.  How is this done?  One
	way might be to have the symbol table from the previous run
	provide input to the next run.  When the metacompiler encounters
	a defining word, it checks the "old" symbol table to see if
	it has been referenced.  If not, it redirects output from the
	target image to the bit bucket until the next defining word
	is encountered.  Has anybody tried a scheme like this?


	Method 2.

	The stripper compresses a running forth system.  The forth
	system is loaded along with the application.  The stripper
	program is then loaded, it's 1st word it "TASK".  A list
	of application words is given to the stripper (e.g. STRIP FOO
	STRIP BAR STRIP BLETCH).  The stripper recursively examines
	the call tree ("used tree?") of each word and adds each word
	encountered to it's symbol table.  Starting from "TASK" and
	working towards lower memory, look at each word and determine
	if the word is in the symbol table.  If it is, do nothing and
	go on to the next word.  If not, and there are no other
	referenced words above this one, advance a pointer which
	points to the last word in the system.  This effectively
	"deletes" this word.  Now for some real fun. If the word is
	not referenced, but other referenced words are above it, then
	shift all the words above this one down in memory by the size
	of this word.  Doing this of course invalidates all other words
	which use this word.  If your system is token threaded (I'm
	certain it's not) you can just change a single entry in the
	token list.  Otherwise the stripper now parses the entire
	program (or rather what's left of it after a partial strip)
	and fixes up all references to this words.  Obviously, this
	part of the task is more difficult if you opt for subroutine
	threading and inline code.  Also, when the words are "shifted",
	the LFA of the previous shifted word must change too.  This
	process is continued until the root word is reached.  Any
	vocabulary pointers must now be changed and the new system is
	then saved out to disk.  A disadvantage of this method is that
	it requires enough memory to hold the unstripped application
	plus the stripper program.

What do people think of these methods?  Are there better ways do this
aside from what I have suggested.  There may be some holes too!
Let me know what you think.



-- 


=============================================================================
Greg Schmidt -> schmidtg@iccgcc.decnet.ab.com
=============================================================================
"People with nothing to hide have nothing to fear from O.B.I.T"
	-- Peter Lomax
-----------------------------------------------------------------------------
Disclaimer: No warranty is expressed or implied.  Void where prohibited.
=============================================================================

UNBCIC@BRFAPESP.BITNET (02/10/91)

Take a look at TCOM. At Simtel, it's located in FPC353-5.ZIP. Maybe FPC353-4
too.

                              (8-DCS)
Daniel C. Sobral
UNBCIC@BRFAPESP.BITNET

wmb@MITCH.ENG.SUN.COM (02/12/91)

> was made to a "stripper" program.  A stripper removes all words not used
> within a given application.  Has anybody developed/used such a program?

Yes, many vendors have such a tool.

>       Method 2.
>
>       The stripper compresses a running forth system.

That's how mine works.

>       and fixes up all references to this words.  Obviously, this
>       part of the task is more difficult if you opt for subroutine
>       threading and inline code.

Not really.  Any threading technique other than token threading is about
equally difficult to relocate the references.  It helps a lot to have a
relocation bitmap identifying which memory locations contain pointers.

>       A disadvantage of this method is that
>       it requires enough memory to hold the unstripped application
>       plus the stripper program.

The stripper program, although rather complex, is not all that big.

Mitch Bradley, wmb@Eng.Sun.COM

koopman@a.gp.cs.cmu.edu (Philip Koopman) (02/16/91)

>     Method 1.
>
>     The stripper works in conjunction with the
>     metacompiler.  ...
>
     That's a nice method, and it's used in the AstroFORTH automatic target
compiler. The  resulting code  consists only of those words, which are used
(directly or  not) by  the main  (root) word of the program being compiled.
There are two remarks:

     1. It is impossible (or, maybe, too complex to be useful) to handle an
unrestricted CREATE  ... DOES>  construction (though it is a common problem
of metacompilation).  The data  structure, created  by a defining word, may
include pointers  to some  words, and  nobody knows  about it, except DOES>
part. OK,  too egg-headed metacompiler probably can understand something in
DOES>, but what about ;CODE ?

     2.  The   algorithm  with  the  reference  counting  you  describe  is
incomplete. Imagine:

     : A ... ; : B ... A ... ; : C ... ;

where C  is the  word to  be compiled.  Reference count  A will be nonzero,
though it  will be  zero for B. So the code for C is to include code for A,
which is not used (and can be much more larger than code for C).
     The correct algorithm will be something like that:

     BEGIN
         LOOK-THROUGH-THE-SOURCE-AND-SET-REFERENCE-COUNTS
         ARE-THERE-WORDS-WITH-ZERO-REFERENCE-COUNT? WHILE
         REMOVE-ALL-WORDS-WITH-ZERO-REFERENCE-COUNT
     REPEAT

     That's correct,  but  I  don't  like  it.  My  metacompiler  uses  the
recursive scheme of marking of the used words. The procedure is:

     : MARK ( word --- )
         IS-THIS-WORD-MARKED? NOT
         IF MARK-THIS-WORD
            MARK-DOES-PART-OF-DEFINING-WORD
            BEGIN
                ARE-THERE-MORE-REFERENCES? WHILE
                TAKE-NEXT-REFERENCE MARK
            REPEAT
         THEN DROP ;

     TAKE-REFERENCE-TO-ROOT MARK

     It is easily seen, that the details of this algorithm depends on class
of the  word to  be processed  (or on the method of its definition: by ":",
"VARIABLE", "CODE",  etc.) So  it looks  very  natural  to  have  the  MARK
operation for  each defining  word -  and to  EXECUTE it from the main MARK
procedure  indirectly  during  compilation  (like  DOES  operation  of  the
defining word  is executed  indirectly during  run of the program). So each
defining word  becomes an object with the standard set of operation (we use
some other  operations, such  as MERGE,  PRINT, etc.)  and all  the set  of
defining words - a metaclass in the terms of object-oriented programming.

     Mitch Bradley writes in his reply:

>
>>    Method 2.
>>
>>    The stripper compresses a running forth system.
>
>That's how mine works.
>
>>    and fixes up all references to this words.  Obviously, this
>>    part of the task is more difficult if you opt for subroutine
>>    threading and inline code.
>
>Not really.  Any threading technique other than token threading is about
>equally difficult to relocate the references. It helps a lot to have a
>relocation bitmap identifying which memory locations contain pointers.
>
     I'm sure  that it  really helps.  But... Hi,  Mitch: I  have  a  small
question for  you. Who  will build this bitmap for the word, defined by the
user defined definition word? I can imagine a built-in scheme for the colon
definitions compilation.  The only way I see to create a correct relocation
bitmap in  any case  is a  tagged stack  with @ and ! (and all other memory
access operations)  reading and  updating the bitmap, and stack operations,
computing a  type of  result  (for  example,  address+integer=address,  and
integer+integer=integer. But  what will be address+address?). I'm sure that
it isn't what we want.

     My opinion  is that  here is  the same  problem -  the problem  of the
"compile-time interpretation"  of the  definition words.  And the only real
way of  the solving  of this problem is development of the CREATE ... DOES>
construction to  the real  class definition.  I think  that solving of this
problem has  a great  meaning,  because  the  metacompilers  are  the  most
nonstandard (and  nonportable) components  of all  FORTH-based  development
systems. Isn't it a time to discuss metacompilation in the future ANS-FORTH
standard?

-- 
-- Igor Agamirzian

Office: +7(812)350-2523    Home: +7(812)314-6055    Fax: +7(812)217-5105
Address: 37 Rackova Str. # 4, Leningrad 191011 U.S.S.R.