andrea@mprgate.mpr.ca (Jennitta Andrea) (10/12/90)
I have a parser which is intended to be called repeatedly from a driver program. The driver program extracts a message from the incoming stream, and then calls yacc to parse a single message. Runtime analysis of this program indicates that the data space utilization increases by approximately the size of a message each time the parser is called. Obviously, this is not a Good Thing. I have typed the value stack to contain pointers to characters. I malloc the memory required for each token in lex (assigning yylval to point to that block of memory) before returning the token to yacc. It appears that this memory is not cleaned up when yacc frees the value stack. Am I required to explicitly free each token once the parser has reduced a rule? (I have not seen this done in any of the yacc examples I've looked at) Is there a 'hook' in yacc for me to specify the automatic freeing of programmer-typed tokens when the stack is reduced? Has anyone else out there experienced this problem? (and found a solution!) Jennitta Andrea | Voice : (604) 293-5362 MPR Teltech Ltd. | Fax : (604) 293-5787 8999 Nelson Way, Burnaby, BC | E-Mail: andrea@eric.mpr.ca Canada, V5A 4B5 | eric.mpr.ca!andrea@uunet.uu.net [Neither yacc nor lex does any garbage collection. Either you have to keep track of tokens as you go along, and free them as they're reduced, or else deal with them some other way. I have used an allocator which gets space from malloc in the usual way but chains all the allocated space together. Then after the parser is done, I can zip back up the chain and free it all. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
bliss@sp64.csrd.uiuc.edu (Brian Bliss) (10/19/90)
In article <2371@kiwi.mpr.ca>, andrea@mprgate.mpr.ca (Jennitta Andrea) writes: |> |> I have a parser which is intended to be called repeatedly from a driver |> program. ... |> |> I have typed the value stack to contain pointers to characters. I |> malloc the memory required for each token in lex (assigning yylval to |> point to that block of memory) before returning the token to yacc. It |> appears that this memory is not cleaned up when yacc frees the value stack. |> |> Am I required to explicitly free each token once the parser has reduced |> a rule? ... 3 ways around this: 1) modify yyparse, the file that is included with the yacc-generated tables to form the parser, to free the tokens when popping the stack. 2) don't use the yacc value stack, make your own, and free the tokens when popping it. 3) merge the lexer into the parser and use alloca () to allocate the memory. instead of having lex recognize "xyz", allocate the string "xyz", and return the token XYZ, include the yacc rule: XYZ: 'x' 'y' 'z' { char *a = alloca (4); sprintf (a, "xyz"); push (a); } all strings will be automatically freed when yyparse () returns. this can get messy if the lex rules are complicated, though. bb -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
djones@megatest.uucp (Dave Jones) (10/21/90)
In article <2371@kiwi.mpr.ca>, andrea@mprgate.mpr.ca (Jennitta Andrea) writes: > > I have a parser which is intended to be called repeatedly from a driver > program. ... > > I have typed the value stack to contain pointers to characters. I > malloc the memory required for each token in lex (assigning yylval to > point to that block of memory) before returning the token to yacc. It > appears that this memory is not cleaned up when yacc frees the value stack. > > Am I required to explicitly free each token once the parser has reduced > a rule? ... > Yes. But you probably don't really want to use malloc directly for this purpose, anyway. One way is to write or borrow an expandable string-table package. When you are through with the strings, free them all at one go. But why copy the tokens at all? A technique I have used when extreme speed was wanted is to read the entire input file into a buffer at the start. Then a token is coded for with a structure containing a pointer into the file-buffer, and a character count. The structures, being all the same size, can be allocated in blocks of a thousand or so, and kept in free-lists for re-use rather than returned with free(). Actual example follows. The pointer into the buffer and the character count are the second and third fields respectively. typedef struct token { int type; /* as per gram.y and y.tab.h */ char* token_image; /* pointer to token within line */ int length; /* length of token */ char* line_image; /* \n terminated line containing token */ int line_num; /* line number within source file */ struct src *src_file; /* file containing the token */ int hval; /* hash-value for identifiers... */ } Token; -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
pardo@cs.washington.edu (David Keppel) (10/24/90)
>andrea@mprgate.mpr.ca (Jennitta Andrea) writes: >> [Call the parser repeatedly from a driver.] >> [|malloc| each token in lex; yacc doesn't |free|.] djones@megatest.uucp (Dave Jones) writes: >[Borrow an string table package; free at one go. > But why copy at all? Read input into a buffer & pools of tokens.] Ah, yes -- The way GNU CC does it (at least recently) is to use an ``obstack'' package. The ``obstack'' package is sort of a cross between |malloc| and |alloca|. Things are allocated in stack order on an ``obstack'' and are freed ``when you know it is safe.'' In addition, it is possible to tentatively allocate a thing on the stack, and then grow it or shrink it until its size is known, then ``freeze'' it and go on to the next allocation. Furthermore, if you have things that need to live until the end of the parse, but aren't needed as long as the parse tree, you can allocate stuff on a shorter-lived store. obstack_init (&short_store); obstack_init (&long_store); short_watermark = oballoc (&short_store, 0); long_watermark = oballoc (&long_store, 0); while (1) { parse(); /* Free temps used by the parser but not used in parse tree. */ obfree (&short_store, short_watermark); use (parser_output); /* Free everything allocaetd by the parser. */ obfree (&long_store, long_watermark); ... other stuff not needing parser output ... } Dave Jones' sollution is probably fastest, while using a generic allocator allows for more efficient use of memory when there are variable-sized objects. ;-D oN ( HACC memory management ) Pardo -- pardo@cs.washington.edu {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.