[comp.lang.c] using lex with strings, not files

leavitt@mordor.hw.stratus.com (Will Leavitt) (05/04/91)

Hi-
   I'd like to use lex & yacc to parse strings within an application, but they
seem to be hardwired to take their input from stdin.  What is the cannonical 
way to get them to work on a string?  Thanks!

     -will

leavitt@mordor.hw.stratus.com


--
...............................................................................
this content-free posting has been brought to you by:
                                           Will Leavitt     508-490-6231
                                           leavitt@mordor.hw.stratus.com 

leavitt@mordor.hw.stratus.com (Will Leavitt) (05/06/91)

Thank you for all the suggestions.  I had come up with various kludges using 
freopen, but this is much cleaner:

from: Bradley White <bww+@K.GP.CS.CMU.EDU>

%%
char *instr;

#undef  input
#define input()         (*instr ? *instr++ : *instr)

#undef  unput
#define unput(c)        ((c), --instr)

--
...............................................................................
this content-free posting has been brought to you by:
                                           Will Leavitt     508-490-6231
                                           leavitt@mordor.hw.stratus.com 

shap@shasta.Stanford.EDU (shap) (05/07/91)

In article <5384@lectroid.sw.stratus.com> leavitt@mordor.hw.stratus.com (Will Leavitt) writes:
>
>   I'd like to use lex & yacc to parse strings within an application, but they
>seem to be hardwired to take their input from stdin.  What is the cannonical 
>way to get them to work on a string?  Thanks!

Yacc doesn't read the input directly, so there's no work there.  My
recollection is that lex uses two macros: GET() and UNGET() to
obtain/pushback characters.  If you take a look at the lex-generated C
code you will spot them. 

What you need to do is supply your own version of these macros at teh
top of your file.

Jonathan

wollman@emily.uvm.edu (Garrett Wollman) (05/12/91)

In article <187@shasta.Stanford.EDU> shap@shasta.Stanford.EDU (shap) writes:
>Yacc doesn't read the input directly, so there's no work there.  My
>recollection is that lex uses two macros: GET() and UNGET() to
>obtain/pushback characters.  If you take a look at the lex-generated C
>code you will spot them. 
>
>What you need to do is supply your own version of these macros at teh
>top of your file.
>
>Jonathan

Well, sort of.  Flex, for one, does not allow the user to redefine GET
and UNGET; with the way flex scanners work, that would mean an
*extremely* serious performance hit.  [In fact, the major performance
feature of flex is the fact that it uses read() to read a block,
rather than reading a line at a time like conventional lex does.]

Thankfully, you can spot a flex scanner very easily in your code...

#ifdef FLEX_SCANNER
/* flex specific code here */
#else
/* old slow lex specific code here */
#endif

But, flex uses a macro to do this read()ing, so that, without too
much hassle, you can write a string-scanner that works correctly under
both flex and lex.  Your users will thank you for it.

In particular, by redefining the following macro (taken from a
2.1-beta skeleton):
/* gets input and stuffs it into "buf".  number of characters read, or YY_NULL,
 * is returned in "result".
 */
#define YY_INPUT(buf,result,max_size) \
	if ( (result = read( fileno(yyin), buf, max_size )) < 0 ) \
	    YY_FATAL_ERROR( "read() in flex scanner failed" );

to something like this

#define YY_INPUT(buf,result,max_size) \
    { \
	int len = strlen(my_string); \
	if(!len) { \
	    result = 0; \
	} else { \
	    strncpy(buf,my_string,result=min(len,max_size)); \
	    my_string += result;  /* possible fencepost error? */ \
	} \
    } 

[there are probably some errors... in which case please remember that
it's now 1:15 in the morning here.]

-GAWollman

Garrett A. Wollman - wollman@emily.uvm.edu

Disclaimer:  I'm not even sure this represents *my* opinion, never
mind UVM's, EMBA's, EMBA-CF's, or indeed anyone else's.