[comp.lang.c] LEX Question

tab@auc.UUCP (Terrence Brannon ) (02/11/89)

Ok, I have written a small routine in lex and if i compile 
it (lex source, cc lex.yy.c), and then run it (a.out) it works fine
accepting values. 

My question is, is there anyway that I could have the lex program accept
integers from somewhere other that stdin? I a hoping to be able to do
something along the lines of ( result = lexroutine(num);)

In other words, is there any way to pass a value to the lex program instead
of it getting its value from stdin?

...gatech!auc!tab

henry@utzoo.uucp (Henry Spencer) (02/14/89)

In article <32240@auc.UUCP> tab@auc.UUCP (0-Terrence Brannon ) writes:
>My question is, is there anyway that I could have the lex program accept
>integers from somewhere other that stdin? ...

If you read the fine print in the lex manual (assuming your vendor hasn't
done something stupid...) you'll find a section on how to replace lex's
i/o routines with your own.
-- 
The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

najmi@hpcuhb.HP.COM (Farrukh Najmi) (02/14/89)

Its not clear to me what it is you are trying to do.
Lex produces a function in lex.yy.c called yylex().
This function as a default reads from standard input.

However one can make it read from a file (using file pointer)
by declaring the file pointer variable and then assigning
it to yyin. The yyin assignment must be before after the %%
and before any regular definitions.

%{
extern FILE *input_fileptr;
%}
%%
    yyin = input_fileptr;

/* regular definitions go here. */

djones@megatest.UUCP (Dave Jones) (02/15/89)

> In article <32240@auc.UUCP> tab@auc.UUCP (0-Terrence Brannon ) writes:
>>My question is, is there anyway that I could have the lex program accept
>>integers from somewhere other that stdin? ...
> 

The officially sanctioned way is to redefine the macros
input(), and unput(). But if you want to do a quick and dirty,
probably unportable hack, read on.

WARNING!! PURISTS ARE CAUTIONED NOT TO READ BEYOND THIS POINT
IN OTHER THAN A THOROUGHLY KICKED BACK AND CHILLED OUT MOOD.
If mild flamage results, discontinue reading. In case of violent
flamage, or flames that persist beyond one reply posting, make
some funny faces in the mirror, while repeating, "Whootie, whootie, 
whootie," in a high-pitched voice.

Now then, the following is probably true for all BSD derived
Unixes, and maybe for others, I dunno.

I don't know if it's documented, but you can assign another
FILE* to the variable [yyin]. Then the standard input() and unput()
will work on that stream instead of stdin.

If you want the input to come from something other than a file
descriptor, as you seemed to imply, you can try being even more
devious:  Munge around with the struct _iobuf, perhaps using _IOMYBUF
and _IOSTRG flags. (If you have the source to sscanf() take a look
at that. It uses the same trick.)  I have not tried this, so if it 
doesn't work, well that's life.  You will have to be sure that the 
buffer is never depleted, less _filbuf get called.

And remember, if anyone asks, _I_ never told you to do it that way!

bill@twwells.uucp (T. William Wells) (02/16/89)

I have this at the top of one of my .l files:

%{
#undef input
#undef unput
# define input() (((yytchar=yysptr>yysbuf?U(*--yysptr): \
    nextchar())==10?(yylineno++,yytchar):yytchar)==EOF?0:yytchar)
# define unput(c) {yytchar= (c);if(yytchar=='\n')yylineno--; \
    *yysptr++=yytchar;}
%}

[These ugly macros are made from the ones that are created by lex.]

And one of my source files contains this bit of code:

int
nextchar()
{
	if (*Bufptr == 0) {
		return ('\n');
	} else {
		return (*Bufptr++);
	}
}

Replace nextchar with whatever function you need for returning
characters.

It does work, I use the program containing it all the time.  I don't
recall any of the design decisions I made when doing this; what it is
used for is lexing out of a buffer. I do not know if the details will
be applicable to your system. (It works on a Sun and on my Microport
V/386 3.0e).

Consider it as a clue as to what to do.

---
Bill
{ uunet!proxftl | novavax } !twwells!bill

henry@utzoo.uucp (Henry Spencer) (02/16/89)

In article <9580001@hpcuhb.HP.COM> najmi@hpcuhb.HP.COM (Farrukh Najmi) writes:
>... one can make it read from a file (using file pointer)
>by declaring the file pointer variable and then assigning
>it to yyin...

Be careful:  this is not a documented feature, at least not in all versions
of lex, and therefore should not be relied on too heavily in portable code.
-- 
The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

nagel@paris.ics.uci.edu (Mark Nagel) (02/18/89)

In article <1989Feb15.200741.15963@utzoo.uucp>, henry@utzoo (Henry Spencer) writes:
|In article <9580001@hpcuhb.HP.COM> najmi@hpcuhb.HP.COM (Farrukh Najmi) writes:
|>... one can make it read from a file (using file pointer)
|>by declaring the file pointer variable and then assigning
|>it to yyin...
|
|Be careful:  this is not a documented feature, at least not in all versions
|of lex, and therefore should not be relied on too heavily in portable code.

Is there any reason (other than losing stdin) that you couldn't just
freopen stdin to the new input file?  That should be portable enough...

Mark Nagel @ UC Irvine, Dept of Info and Comp Sci
ARPA: nagel@ics.uci.edu              | Charisma doesn't have jelly in the
UUCP: {sdcsvax,ucbvax}!ucivax!nagel  | middle. -- Jim Ignatowski

tps@chem.ucsd.edu (Tom Stockfisch) (02/23/89)

In article <1989Feb15.200741.15963@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
[ discussion about using yyin/yyout in lex programs ]

>Be careful:  this is not a documented feature, at least not in all versions
>of lex, and therefore should not be relied on too heavily in portable code.

Have you run across a version of lex that does NOT have yyin/yyout?
I've always thought changing yyin was much more portable than re-defining
input().
-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

henry@utzoo.uucp (Henry Spencer) (02/24/89)

In article <412@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes:
>>Be careful:  this is not a documented feature, at least not in all versions
>>of lex, and therefore should not be relied on too heavily in portable code.
>
>Have you run across a version of lex that does NOT have yyin/yyout?
>I've always thought changing yyin was much more portable than re-defining
>input().

Re-defining "input" is in the documentation as the way to alter input; any
version of lex that follows the manual (and hence deserves to be called
"lex") will permit it and have it work as documented.  "yyin" is not
documented at all -- it is an accident of the original implementation --
and hence may change without warning between versions.
-- 
The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

tps@chem.ucsd.edu (Tom Stockfisch) (02/24/89)

In article <1989Feb23.164129.8672@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <412@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes:
>>Have you run across a version of lex that does NOT have yyin/yyout?
>>I've always thought changing yyin was much more portable than re-defining
>>input().

>Re-defining "input" is in the documentation as the way to alter input; any
>version of lex that follows the manual (and hence deserves to be called
>"lex") will permit it and have it work as documented.  "yyin" is not
>documented at all -- it is an accident of the original implementation --
>and hence may change without warning between versions.


I must be missing something.  Looking at lex.yy.c on my machine shows that
input() is defined as

# define input() (((yytchar=yysptr>yysbuf?U(*--yysptr):getc(yyin))==10? \
	(yylineno++,yytchar):yytchar)==EOF?0:yytchar)

so if I re-#define input(), don't I have to use yytchar,yysptr, and yysbuf?
I would assume one of these would be more likely to change than yyin.
-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

henry@utzoo.uucp (Henry Spencer) (02/25/89)

In article <416@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes:
>>Re-defining "input" is in the documentation as the way to alter input; any
>>version of lex that follows the manual (and hence deserves to be called
>>"lex") will permit it and have it work as documented.  "yyin" is not
>>documented at all...
>
>I must be missing something.  Looking at lex.yy.c on my machine shows that
>input() is defined as
>
># define input() (((yytchar=yysptr>yysbuf?U(*--yysptr):getc(yyin))==10? \
>	(yylineno++,yytchar):yytchar)==EOF?0:yytchar)
>
>so if I re-#define input(), don't I have to use yytchar,yysptr, and yysbuf?

No; they are lex's buffered-input system.  If you redefine input(), you
have to provide your own buffering scheme, or have it provided for you
by a library.  You need not, and should not, use lex's own internal
variables to do it with.  The input interface to lex is fully defined
in the documentation:  input() should yield the next character, unput(c)
should push c back into the input stream.  Meet those specs (and a few
more details that are in the documentation) and it works fine; I've done
it on occasion.  Messing with mysterious and undocumented variables whose
names start with "yy" is neither necessary nor desirable.

I have, on occasion, just assigned a stdio file pointer to yyin... but
I've never claimed it was portable or good practice, and I've always
documented it as a bug.

If you define your own input stuff, the variables lex uses for its input
system just go unused.  This is a minor waste of space.  It could undoubtedly
be eliminated by some cleverness in lex, if anyone felt it worth bothering
about.
-- 
The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

nagel@paris.ics.uci.edu (Mark Nagel) (02/25/89)

In article <416@chem.ucsd.EDU>, tps@chem (Tom Stockfisch) writes:
|
|I must be missing something.  Looking at lex.yy.c on my machine shows that
|input() is defined as
|
|# define input() (((yytchar=yysptr>yysbuf?U(*--yysptr):getc(yyin))==10? \
|	(yylineno++,yytchar):yytchar)==EOF?0:yytchar)
|
|so if I re-#define input(), don't I have to use yytchar,yysptr, and yysbuf?
|I would assume one of these would be more likely to change than yyin.

If you redefine input() in lex, you'd most likely redefine it in such
a way that you'd manage the input entirely yourself.  So you would have
no need for lex's internal input management.

Mark Nagel @ UC Irvine, Dept of Info and Comp Sci
ARPA: nagel@ics.uci.edu              | Charisma doesn't have jelly in the
UUCP: {sdcsvax,ucbvax}!ucivax!nagel  | middle. -- Jim Ignatowski

rsalz@bbn.com (Rich Salz) (02/25/89)

In <1989Feb24.165832.9921@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
<... If you redefine input(), you
<have to provide your own buffering scheme, or have it provided for you
<by a library. ...

You will probably also want to arrange to increment yylineno on every
newline.  This is often done in the pattern/action part.
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.

earleh@eleazar.dartmouth.edu (Earle R. Horton) (02/25/89)

In article <1521@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
>In <1989Feb24.165832.9921@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
><... If you redefine input(), you
><have to provide your own buffering scheme, or have it provided for you
><by a library. ...
>
>You will probably also want to arrange to increment yylineno on every
>newline.  This is often done in the pattern/action part.

     This is only true if you USE yylineno.  Lex doesn't use it for
anything beyond keeping track of it for you.  If you don't care what
the line number is, lex doesn't either.
Earle R. Horton. 23 Fletcher Circle, Hanover, NH 03755--Graduate student.
He who puts his hand to the plow and looks back is not fit for the kingdom
of winners.  In any case, 'BACK' doesn't work.

Greg.Langham.Of.7101/7@f7.n7101.z8.FIDONET.ORG (Greg Langham Of 7101/7) (03/03/89)

This is a good point.  However, this is very unlikely...


--  
The Fish Aquarium.

sme@computing-maths.cardiff.ac.uk (Simon Elliott) (03/08/89)

In article <416@chem.ucsd.EDU>, tps@chem.ucsd.edu (Tom Stockfisch) writes:
: I must be missing something.  Looking at lex.yy.c on my machine shows that
: input() is defined as
: 
: # define input() (((yytchar=yysptr>yysbuf?U(*--yysptr):getc(yyin))==10? \
: 	(yylineno++,yytchar):yytchar)==EOF?0:yytchar)
: 
: so if I re-#define input(), don't I have to use yytchar,yysptr, and yysbuf?
: I would assume one of these would be more likely to change than yyin.

But if you look again, you will notice that this and unput() are the only
uses of these variables, so there is no objection to using other
variables in your own input macros/functions.
-- 
--------------------------------------------------------------------------
Simon Elliott            Internet: sme%v1.cm.cf.ac.uk@cunyvm.cuny.edu
UWCC Computer Centre     JANET:    sme@uk.ac.cf.cm.v1
40/41 Park Place         UUCP:     {backbones}!mcvax!ukc!reading!cf-cm!sme
Cardiff, Wales           PHONE:    +44 222 874300