john@basho.uucp (John Lacey) (08/10/90)
Normally, of course, one wants a scanner (and a parser) to work from
a file, perhaps stdin. Sigh. Well, I want one that works from a string.
I am using Flex 2.3, and Bison 1.11. I tried the following few #define's:
#undef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
{ \
for ( result = 0; *ch_this && result < max_size; result ++ ) \
buf[result] = *ch_this++; \
}
#define YY_USER_INIT \
if ( scan_init ) { \
if ( yy_flex_debug ) \
printf ( "-- initializing for scan %d\n", scan_init ); \
ch_this = inbuffer; \
scan_init = 0; }
with the following couple of definitions and declarations in the scanner:
static char * ch_this;
extern char * inbuffer;
extern int scan_init;
and with inbuffer and scan_init defined in the code that calls yyparse().
This didn't work. Well, actually, it works the first time yyparse() is
called, but not again. Now, YY_USER_INIT is used inside an if statement
that checks yy_init, so I moved it out of there in the scanner skeleton
so that YY_USER_INIT is seen every time the scanner is called. Still
no go.
Has anyone done this, or see a way to do it, or know a way to do it, or ....
Thanks.
--
John Lacey,
E-mail: ...!osu-cis!n8emr!uncle!basho!john (coming soon: john@basho.uucp)
V-mail: (614) 436--3773, or 487--8570
"What was the name of the dog on Rin-tin-tin?" --Mickey Rivers, ex-Yankee CF
ptb@ittc.wec.com (Pat Broderick) (08/10/90)
In article <1990Aug10.012927.5558@basho.uucp>, john@basho.uucp (John Lacey) writes: > Normally, of course, one wants a scanner (and a parser) to work from > a file, perhaps stdin. Sigh. Well, I want one that works from a string. > ... Recently I had occasion to do something similar. What we did was roughly as follows: - strings to be parsed are maintained in memory - to parse a string a global pointer known to lex is set to point at the beginning of the string - the input() macro was redefined in terms of this pointer (standard uses getc(yyin)) The things needed might look something like: LEX: # define input() (((yytchar=yysptr>yysbuf?U(*--yysptr):getc(yyin))==10?(yylineno++,yytchar):yytchar)==EOF?0:yytchar) /* standard defn from lex */ # define input() (((yytchar=yysptr>yysbuf?U(*--yysptr):(*yynyy++))==10?(yylineno++,yytchar):yytchar)==EOF?0:yytchar) ^^^^^^^^^^ /* modified defn to use string */ extern char *yynyy; /* will pt to start of string */ Function invoking parser: char *yynyy; /* globally visible */ .... yynyy = start_of_string; yyparse(); This works fine for us, hope it helps. -- Patrick T. Broderick |ptb@ittc.wec.com | |uunet!ittc!ptb | |(412)733-6265 |
bogatko@lzga.ATT.COM (George Bogatko) (08/10/90)
In article <1990Aug10.012927.5558@basho.uucp>, john@basho.uucp (John Lacey) writes: > Has anyone done this, or see a way to do it, or know a way to do it, or .... Put these lines in your lex file after the #include lines %{ #include <stdio.h> #include <y.tab.h> extern char *mis_ptr; #undef input #undef unput # define input() (*mis_ptr=='\n'?0:*mis_ptr++) # define unput(c) (*--mis_ptr=(c) ) %} Now have a char buff called myinputstring char myinputstring[100]; do the following in main: char *mis_ptr; main() { for(;;) { gets(buf); mis_ptr = buf; yylex(); } } I think you get the picture now? GB
jal@valha1.ATT.COM (Joseph A. Leggio) (08/12/90)
From article <1990Aug10.012927.5558@basho.uucp>, by john@basho.uucp (John Lacey): > Normally, of course, one wants a scanner (and a parser) to work from > a file, perhaps stdin. Sigh. Well, I want one that works from a string. > > Has anyone done this, or see a way to do it, or know a way to do it, or .... > > -- > John Lacey, I have used these "input" and "unput" routines in many programs where I wanted complete control of the input stream. The example here uses fgets to fill a character array from stdin, but you could fill it from any source you wish. You only need point pointer "p" to the start of the array each time you read a new line. Only restriction: unput cannot back up past the start of a line. (I have not found this to be a problem as I do not usually try to match patterns which span multiple lines.) I use System V Release 3 AT&T lex, "flex" might work the same, look for the #defines for "input" and "unput" in your code. ================================================== %% Lex reg-expr's go here %% #define BUFFER_SIZE 1024 char *p; char buf[BUFFER_SIZE]; main(){ p = buf; /* point "p" to start of buf for first line */ while( fgets(buf, sizeof(buf), stdin) != NULL ) { /* read line */ yylex(); /* parse line */ p = buf; /* point "p" back to start of buf for next line */ } exit(0); } #ifdef input #undef input #endif #ifdef unput #undef unput #endif /* replacement "input" routine for lex, uses char array "buf" */ char input() { if ( p < buf + ( BUFFER_SIZE - 1 ) ) return(*p++); else return((char)0); } /* replacement "unput" routine for lex, uses char array "buf" */ unput(c) char c; { if ( p > buf ) *(--p) = c; } ============================================================= Joe Leggio WB2HOL AT&T Customer Software Services Valhalla, NY att!valha1!jal
chris@mimsy.umd.edu (Chris Torek) (08/13/90)
(This topic probably belongs elsewhere; perhaps comp.lang.misc or comp.unix.questions.... Ah well.) In article <174@ittc.wec.com> ptb@ittc.wec.com (Pat Broderick) writes: >The things needed might look something like: ># define input() (((yytchar=yysptr>yysbuf?U(*--yysptr):getc(yyin))==10? (yylineno++,yytchar):yytchar)==EOF?0:yytchar) /* standard defn from lex */ ># define input() (((yytchar=yysptr>yysbuf?U(*--yysptr):(*yynyy++))==10? (yylineno++,yytchar):yytchar)==EOF?0:yytchar) ^^^^^^^^^^ >This works fine for us, hope it helps. This should work, but is overkill. (It also does not address a question whose answer I myself am unsure about.) Here is what is going on: Lex (or Flex or other similar lexer of your choice) implements a DFSA (Deterministic Finite State Automaton) that is simply an optimized (in space if not time) variant on int table[NSTATES][128] = { huge amounts of junk }; yylex() { int state = 0; /* ignoring BEGINs that is */ yyleng = 0; for (;;) { c = input(); /* eat the next char */ yytext[yyleng++] = c; /* store it in yytext */ state = table[state][c];/* find the next state */ switch (state) { /* and see what to do */ ... cases that exactly match something ... do actions from C code; ... cases indicating we ate too much ... unput(some of the things we ate); do actions from C code; ... cases indicating `no match' ... output(the things we ate); break; } } } One noteworthy thing about this is that lex can never unput() something it has not input() `from the same place' (unless you put you own unput() actions into your lexer: a dangerous practise). Thus, if you are reading from a string in a buffer, your `unput' action can be much simpler, and likewise your input() macro can be simplified: #define input() ((yytchar = *mystring++) == '\n' ? (yylineno++, yytchar) : \ yytchar) #define unput(c) (mystring--) These two also take advantage of the fact that Lex wants `EOF' to be the value 0, rather than the (implementation-defined but usually) -1 that stdio returns. The end of a C string is the character '\0' which has the value 0. The question I have is whether lex might call input() again after reading EOF once. Since the end of a real file tends to remain the end of the file no matter how many times it is read per second, it seems possible that the implementation might invoke input() again after input() returns 0 but without an intervening unput(0)---i.e., it may depend on EOF being `sticky'. In this case the input macro must be more careful: #define input() ((yytchar = *mystr++) == 0 ? (mystr--, 0) : \ (yytchar == '\n' ? (yylineno++, '\n') : yytchar)) or as a GCC inline function: static inline input() { int c = *mystr++; if (c == 0) mystr--; else if (c == '\n') yylineno++; return c; } This requires a corresponding change to unput(), however, since now unput(0) should do nothing: #define unput(c) ((c) ? mystr-- : 0) All of these eliminate the need for yysbuf (an array of size YYLMAX that holds unput() characters since ungetc() only guarantees one character of pushback). Lex could be considerably more efficient by avoiding all this copying of text from one place to another; I believe flex does this. This usually means bypassing stdio, of course.... -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris (New campus phone system, active sometime soon: +1 301 405 2750)
chris@mimsy.umd.edu (Chris Torek) (08/14/90)
In article <25996@mimsy.umd.edu> I suggested: >#define unput(c) (mystring--) (and then eventually) >#define unput(c) ((c) ? mystr-- : 0) The first of these will fail because lex uses unput as unput(*--yylastch) and unput(*yylastch--) Thanks to Brad White for noticing this error. The first one should be `#define unput(c) ((c), mystring--)'. (This is what I get for making changes to lex things based on theoretical arguments without checking to see whether lex uses good programming practises :-) .) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris (New campus phone system, active sometime soon: +1 301 405 2750)