schmidtg@iccgcc.decnet.ab.com (02/08/91)
Sometime ago in a posting regarding metacompilation, a passing reference
was made to a "stripper" program. A stripper removes all words not used
within a given application. Has anybody developed/used such a program?
If so, I would be interested in learning more about it. Here are some of
my own thoughts/questions about how a stripper might work.
Method 1.
The stripper works in conjunction with the metacompiler. The
metacompiler is modified to keep a reference count for each
defined word in the symbol table. After metacompilation, if
the reference count is zero, the word is defined, but not
referenced and may therefore be eliminated in the subsequent
metacompilation. Now comes the fun part. The metacompilation
is rerun and these words are ignored. How is this done? One
way might be to have the symbol table from the previous run
provide input to the next run. When the metacompiler encounters
a defining word, it checks the "old" symbol table to see if
it has been referenced. If not, it redirects output from the
target image to the bit bucket until the next defining word
is encountered. Has anybody tried a scheme like this?
Method 2.
The stripper compresses a running forth system. The forth
system is loaded along with the application. The stripper
program is then loaded, it's 1st word it "TASK". A list
of application words is given to the stripper (e.g. STRIP FOO
STRIP BAR STRIP BLETCH). The stripper recursively examines
the call tree ("used tree?") of each word and adds each word
encountered to it's symbol table. Starting from "TASK" and
working towards lower memory, look at each word and determine
if the word is in the symbol table. If it is, do nothing and
go on to the next word. If not, and there are no other
referenced words above this one, advance a pointer which
points to the last word in the system. This effectively
"deletes" this word. Now for some real fun. If the word is
not referenced, but other referenced words are above it, then
shift all the words above this one down in memory by the size
of this word. Doing this of course invalidates all other words
which use this word. If your system is token threaded (I'm
certain it's not) you can just change a single entry in the
token list. Otherwise the stripper now parses the entire
program (or rather what's left of it after a partial strip)
and fixes up all references to this words. Obviously, this
part of the task is more difficult if you opt for subroutine
threading and inline code. Also, when the words are "shifted",
the LFA of the previous shifted word must change too. This
process is continued until the root word is reached. Any
vocabulary pointers must now be changed and the new system is
then saved out to disk. A disadvantage of this method is that
it requires enough memory to hold the unstripped application
plus the stripper program.
What do people think of these methods? Are there better ways do this
aside from what I have suggested. There may be some holes too!
Let me know what you think.
--
=============================================================================
Greg Schmidt -> schmidtg@iccgcc.decnet.ab.com
=============================================================================
"People with nothing to hide have nothing to fear from O.B.I.T"
-- Peter Lomax
-----------------------------------------------------------------------------
Disclaimer: No warranty is expressed or implied. Void where prohibited.
=============================================================================UNBCIC@BRFAPESP.BITNET (02/10/91)
Take a look at TCOM. At Simtel, it's located in FPC353-5.ZIP. Maybe FPC353-4
too.
(8-DCS)
Daniel C. Sobral
UNBCIC@BRFAPESP.BITNETwmb@MITCH.ENG.SUN.COM (02/12/91)
> was made to a "stripper" program. A stripper removes all words not used > within a given application. Has anybody developed/used such a program? Yes, many vendors have such a tool. > Method 2. > > The stripper compresses a running forth system. That's how mine works. > and fixes up all references to this words. Obviously, this > part of the task is more difficult if you opt for subroutine > threading and inline code. Not really. Any threading technique other than token threading is about equally difficult to relocate the references. It helps a lot to have a relocation bitmap identifying which memory locations contain pointers. > A disadvantage of this method is that > it requires enough memory to hold the unstripped application > plus the stripper program. The stripper program, although rather complex, is not all that big. Mitch Bradley, wmb@Eng.Sun.COM
koopman@a.gp.cs.cmu.edu (Philip Koopman) (02/16/91)
> Method 1. > > The stripper works in conjunction with the > metacompiler. ... > That's a nice method, and it's used in the AstroFORTH automatic target compiler. The resulting code consists only of those words, which are used (directly or not) by the main (root) word of the program being compiled. There are two remarks: 1. It is impossible (or, maybe, too complex to be useful) to handle an unrestricted CREATE ... DOES> construction (though it is a common problem of metacompilation). The data structure, created by a defining word, may include pointers to some words, and nobody knows about it, except DOES> part. OK, too egg-headed metacompiler probably can understand something in DOES>, but what about ;CODE ? 2. The algorithm with the reference counting you describe is incomplete. Imagine: : A ... ; : B ... A ... ; : C ... ; where C is the word to be compiled. Reference count A will be nonzero, though it will be zero for B. So the code for C is to include code for A, which is not used (and can be much more larger than code for C). The correct algorithm will be something like that: BEGIN LOOK-THROUGH-THE-SOURCE-AND-SET-REFERENCE-COUNTS ARE-THERE-WORDS-WITH-ZERO-REFERENCE-COUNT? WHILE REMOVE-ALL-WORDS-WITH-ZERO-REFERENCE-COUNT REPEAT That's correct, but I don't like it. My metacompiler uses the recursive scheme of marking of the used words. The procedure is: : MARK ( word --- ) IS-THIS-WORD-MARKED? NOT IF MARK-THIS-WORD MARK-DOES-PART-OF-DEFINING-WORD BEGIN ARE-THERE-MORE-REFERENCES? WHILE TAKE-NEXT-REFERENCE MARK REPEAT THEN DROP ; TAKE-REFERENCE-TO-ROOT MARK It is easily seen, that the details of this algorithm depends on class of the word to be processed (or on the method of its definition: by ":", "VARIABLE", "CODE", etc.) So it looks very natural to have the MARK operation for each defining word - and to EXECUTE it from the main MARK procedure indirectly during compilation (like DOES operation of the defining word is executed indirectly during run of the program). So each defining word becomes an object with the standard set of operation (we use some other operations, such as MERGE, PRINT, etc.) and all the set of defining words - a metaclass in the terms of object-oriented programming. Mitch Bradley writes in his reply: > >> Method 2. >> >> The stripper compresses a running forth system. > >That's how mine works. > >> and fixes up all references to this words. Obviously, this >> part of the task is more difficult if you opt for subroutine >> threading and inline code. > >Not really. Any threading technique other than token threading is about >equally difficult to relocate the references. It helps a lot to have a >relocation bitmap identifying which memory locations contain pointers. > I'm sure that it really helps. But... Hi, Mitch: I have a small question for you. Who will build this bitmap for the word, defined by the user defined definition word? I can imagine a built-in scheme for the colon definitions compilation. The only way I see to create a correct relocation bitmap in any case is a tagged stack with @ and ! (and all other memory access operations) reading and updating the bitmap, and stack operations, computing a type of result (for example, address+integer=address, and integer+integer=integer. But what will be address+address?). I'm sure that it isn't what we want. My opinion is that here is the same problem - the problem of the "compile-time interpretation" of the definition words. And the only real way of the solving of this problem is development of the CREATE ... DOES> construction to the real class definition. I think that solving of this problem has a great meaning, because the metacompilers are the most nonstandard (and nonportable) components of all FORTH-based development systems. Isn't it a time to discuss metacompilation in the future ANS-FORTH standard? -- -- Igor Agamirzian Office: +7(812)350-2523 Home: +7(812)314-6055 Fax: +7(812)217-5105 Address: 37 Rackova Str. # 4, Leningrad 191011 U.S.S.R.