[comp.compilers] Question About YACC's Memory Management

andrea@mprgate.mpr.ca (Jennitta Andrea) (10/12/90)

  I have a parser which is intended to be called repeatedly from a driver
program.  The driver program extracts a message from the incoming stream,
and then calls yacc to parse a single message.  Runtime analysis of this
program indicates that the data space utilization increases by
approximately the size of a message each time the parser is called.
Obviously, this is not a Good Thing.

  I have typed the value stack to contain pointers to characters.  I
malloc the memory required for each token in lex (assigning yylval to
point to that block of memory) before returning the token to yacc.  It
appears that this memory is not cleaned up when yacc frees the value
stack.

  Am I required to explicitly free each token once the parser has reduced
a rule?  (I have not seen this done in any of the yacc examples I've
looked at) Is there a 'hook' in yacc for me to specify the automatic
freeing of programmer-typed tokens when the stack is reduced?  Has anyone
else out there experienced this problem? (and found a solution!)
  
Jennitta Andrea                 | Voice : (604) 293-5362
MPR Teltech Ltd. 		| Fax   : (604) 293-5787
8999 Nelson Way, Burnaby, BC    | E-Mail: andrea@eric.mpr.ca
Canada, V5A 4B5                 |         eric.mpr.ca!andrea@uunet.uu.net
[Neither yacc nor lex does any garbage collection.  Either you have to
keep track of tokens as you go along, and free them as they're reduced, or
else deal with them some other way.  I have used an allocator which gets
space from malloc in the usual way but chains all the allocated space
together.  Then after the parser is done, I can zip back up the chain and
free it all. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

bliss@sp64.csrd.uiuc.edu (Brian Bliss) (10/19/90)

In article <2371@kiwi.mpr.ca>, andrea@mprgate.mpr.ca (Jennitta Andrea) writes:
|> 
|>   I have a parser which is intended to be called repeatedly from a driver
|> program.  ...
|> 
|>   I have typed the value stack to contain pointers to characters.  I
|> malloc the memory required for each token in lex (assigning yylval to
|> point to that block of memory) before returning the token to yacc.  It
|> appears that this memory is not cleaned up when yacc frees the value stack.
|> 
|>   Am I required to explicitly free each token once the parser has reduced
|> a rule?  ...

3 ways around this:

1) modify yyparse, the file that is included with the yacc-generated
   tables to form the parser, to free the tokens when popping the stack.

2) don't use the yacc value stack, make your own, and free the tokens
   when popping it.

3) merge the lexer into the parser and use alloca () to allocate the
   memory.  instead of having lex recognize "xyz", allocate the string
   "xyz", and return the token XYZ, include the yacc rule:

   XYZ:    'x' 'y' 'z' {
              char *a = alloca (4);
              sprintf (a, "xyz");
              push (a);
            }

    all strings will be automatically freed when yyparse () returns.
    this can get messy if the lex rules are complicated, though.

bb
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

djones@megatest.uucp (Dave Jones) (10/21/90)

In article <2371@kiwi.mpr.ca>, andrea@mprgate.mpr.ca (Jennitta Andrea) writes:
> 
>   I have a parser which is intended to be called repeatedly from a driver
> program.  ...
> 
>   I have typed the value stack to contain pointers to characters.  I
> malloc the memory required for each token in lex (assigning yylval to
> point to that block of memory) before returning the token to yacc.  It
> appears that this memory is not cleaned up when yacc frees the value stack.
> 
>   Am I required to explicitly free each token once the parser has reduced
> a rule?  ...
> 

Yes.

But you probably don't really want to use malloc directly for this purpose,
anyway.

One way is to write or borrow an expandable string-table package.
When you are through with the strings, free them all at one go.

But why copy the tokens at all? A technique I have used when extreme
speed was wanted is to read the entire input file into a buffer at the
start. Then a token is coded for with a structure containing a pointer
into the file-buffer, and a character count. The structures, being all
the same size, can be allocated in blocks of a thousand or so, and kept
in free-lists for re-use rather than returned with free().

Actual example follows. The pointer into the buffer and the character
count are the second and third fields respectively.

typedef struct token
{ int    type;          /* as per gram.y and y.tab.h */
  char*  token_image;   /* pointer to token within line  */
  int    length;        /* length of token */
  char*  line_image;    /* \n terminated line containing token */
  int    line_num;      /* line number within source file */
  struct src *src_file; /* file containing the token */
  int    hval;          /* hash-value for identifiers... */
}
Token;
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

pardo@cs.washington.edu (David Keppel) (10/24/90)

>andrea@mprgate.mpr.ca (Jennitta Andrea) writes:
>> [Call the parser repeatedly from a driver.]
>> [|malloc| each token in lex; yacc doesn't |free|.]

djones@megatest.uucp (Dave Jones) writes:
>[Borrow an string table package; free at one go.
> But why copy at all?  Read input into a buffer & pools of tokens.]

Ah, yes -- The way GNU CC does it (at least recently) is to use an
``obstack'' package.  The ``obstack'' package is sort of a cross
between |malloc| and |alloca|.  Things are allocated in stack order on
an ``obstack'' and are freed ``when you know it is safe.''  In
addition, it is possible to tentatively allocate a thing on the stack,
and then grow it or shrink it until its size is known, then ``freeze''
it and go on to the next allocation.

Furthermore, if you have things that need to live until the end of the
parse, but aren't needed as long as the parse tree, you can allocate
stuff on a shorter-lived store.

    obstack_init (&short_store);
    obstack_init (&long_store);
    short_watermark = oballoc (&short_store, 0);
    long_watermark = oballoc (&long_store, 0);

    while (1) {
	parse();
	/* Free temps used by the parser but not used in parse tree.  */
	obfree (&short_store, short_watermark);

	use (parser_output);
	/* Free everything allocaetd by the parser. */
	obfree (&long_store, long_watermark);

	... other stuff not needing parser output ...
    }

Dave Jones' sollution is probably fastest, while using a generic
allocator allows for more efficient use of memory when there are
variable-sized objects.

	;-D oN  ( HACC memory management )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.