jdc@naucse.UUCP (John Campbell) (05/11/88)
Well I just finished making flex (Fast lex from Vern Paxson) work on VMS. The reason for this posting is to raise a 'C' question and to let people know flex will run on VMS. The 'C' question: Flex has the following global line: FILE *yyin=stdin, *yyout=stdout; which does not work at compile time on VMS. In other words, it appears the compiler does not treat stdin as a constant--it's value is known only at run-time. (VMS stdio.h says "extern noshare FILE *stdin;.) To work around this I had to concote a fake main(): FILE *yyin, *yyout; main() { yyin = stdin; yyout = stdout; Question: Is my compiler deficient? Is the initialization done in flex suppose to work in ANSI C? Flex on VMS: For those interested, the following changes were done to make flex work on VMS: 1) 2 macro names > 31 characters where changed, 2) some file names were corrected to fit the VMS file system, 3) the yyin problem mentioned above was worked around, 4) bzero was defined as OTS$MOVEC5, and 5) unlink() was replaced with delete(). If there is enough interest I can post a SEARCH for VMS (300 lines) indicating how the original was changed. I'm afraid my port is only a start toward folding VMS support back into the original. Anyone wanting to improve on my effort is more than welcome, but I fear the unix community may be less than sympathetic to those of us stuck on VMS :-). MUCH THANKS TO VERN PAXSON, KEVIN GONG, VAN JACOBSON, ET.AL.!!!!!! -- John Campbell ...!arizona!naucse!jdc unix? Sure send me a dozen, all different colors.
scjones@sdrc.UUCP (Larry Jones) (05/12/88)
In article <690@naucse.UUCP>, jdc@naucse.UUCP (John Campbell) writes: > The 'C' question: > Flex has the following global line: > > FILE *yyin=stdin, *yyout=stdout; > > which does not work at compile time on VMS. In other words, it appears > the compiler does not treat stdin as a constant--it's value is known only > at run-time. (VMS stdio.h says "extern noshare FILE *stdin;.) To work According to the latest ANSI draft (and many of the previous ones), stdin and friends are simply expressions and not necessarily constant expressions. Thus, they may not be used portably to initialize objects with static storage duration. So, the compiler's OK, flex is not maximally portable (as you found out). ---- Larry Jones UUCP: ...!sdrc!scjones SDRC AT&T: (513) 576-2070 2000 Eastman Dr. BIX: ltl Milford, OH 45150 "When all else fails, read the directions."
wcs@skep2.ATT.COM (Bill.Stewart.<ho95c>) (05/20/88)
In article <278@sdrc.UUCP> scjones@sdrc.UUCP (Larry Jones) writes: :In article <690@naucse.UUCP>, jdc@naucse.UUCP (John Campbell) writes: :> The 'C' question: :> Flex has the following global line: :> FILE *yyin=stdin, *yyout=stdout; :> which does not work at compile time on VMS. In other words, it appears :> the compiler does not treat stdin as a constant--it's value is known only :> at run-time. (VMS stdio.h says "extern noshare FILE *stdin;.) To work : :According to the latest ANSI draft (and many of the previous ones), stdin and :friends are simply expressions and not necessarily constant expressions. Thus, :they may not be used portably to initialize objects with static storage :duration. So, the compiler's OK, flex is not maximally portable (as you If the compiler's OK, what's this "noshare" business? It's been a while since I've seen the ANSI C draft, but I don't remember that being in it - it looks kind of like noalias? (Noalias had to go, and did - it was non-negotiable.) Also, something you declare to be extern shouldn't be assumed to be constant. So both are somewhat non-portable. -- # Thanks; # Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs # skep2 is a local machine I'm trying to turn into a server. Please send # mail to ho95c or ho95e instead. Thanks.
LEICHTER@VENUS.YCC.YALE.EDU ("Jerry Leichter ", LEICHTER-JERRY@CS.YALE.EDU) (05/24/88)
:> The 'C' question: :> Flex has the following global line: :> FILE *yyin=stdin, *yyout=stdout; :> which does not work at compile time on VMS. In other words, it :> appears the compiler does not treat stdin as a constant--it's :> value is known only at run-time. (VMS stdio.h says :> "extern noshare FILE *stdin;.) To work : :According to the latest ANSI draft (and many of the previous ones), :stdin and friends are simply expressions and not necessarily :constant expressions. Thus, they may not be used portably to :initialize objects with static storage duration. So, the compiler's :OK, flex is not maximally portable (as you If the compiler's OK, what's this "noshare" business? It's been a while since I've seen the ANSI C draft, but I don't remember that being in it - it looks kind of like noalias? (Noalias had to go, and did - it was non-negotiable.) Also, something you declare to be extern shouldn't be assumed to be constant. So both are somewhat non-portable. "noshare" is a VAX C extension to control storage allocation; it is mainly needed when creating shared images --- it says that the variable is to go into a "copy on reference" segment, rather than being shared among all users of the shared image. Since the VAX C run-time library, including all the code for such things as printf, normally lives in a shared library (VAXCRTL.EXE), you can see that stdio had better be "noshare" unless you want everyone on the system writing to the same standard output! "noshare" pre-dates ANSI, which it is certainly not compatible with. It could presumably be replaced with a #pragma or something like "__noshare" to be compatible. Whether it can appear in the declaration of stdin and friends is an interes- ting question. What it comes down to is: Does the ANSI spec guarantee a particular declaration syntax for these things, or does it simply guarantee that some sort of appropriate - but system-specific - definition will be available if stdio.h is included. -- Jerry
jdc@naucse.UUCP (John Campbell) (08/04/89)
The latest flex was posted a month or so ago by Vern Paxson. (Flex is a faster lex that will be replacing most older lexes on many unix systems.) I'm working on bringing it up on VMS and I've run into the following situation... The "older" flex read a line at a time (YY_MAX_LINE) and YY_INPUT worked pretty much without a hitch. The new flex reads a large chunck at a time. With VMS STREAM-LF files this works just fine--but with "normal" VFC editor text files (darn these RMS things) the VMS 'C' rtl will only return at most 1 record full of characters for any large read() byte request. During processing on a flex input file, flex complains of a "NULL in input." This seems to be because yyunput() wants to "shift things up to make room" and assumes that the end of the valid buffer is around YY_BUF_SIZE deep. On VMS, with it's non-standard read() behavior, the end of the valid data is frequently not near YY_BUF_SIZE. (On non-record oriented file systems read() will, of course, only do this on the last buffer.) So my problem is not that I can't understand what is causing the "NULL in input" message, but a request for what the best solution might be. I could, of course, create a special VMS YY_INPUT macro that fills the buffer like a unix read() would using getc, or I could try to patch yyunput() and hope that the read() behavior assumption is isolated to this spot in flex's code. If you have an opinion on which way to go I'd like to hear it. If you have already solved the problem I'd like to know what you did. If you can think of a reason why this read() assumption is a bad idea for unix (streams and producer/consumers that might not always behave like flex is assuming) I'd like to know about that also. If I've been unclear and you want to know what the he-- I'm talking about just mail me a message indicating where I was unclear. I'm going to try hard to shelve this project for a day or so... -- John Campbell ...!arizona!naucse!jdc CAMPBELL@NAUVAX.bitnet unix? Sure send me a dozen, all different colors.
scs@adam.pika.mit.edu (Steve Summit) (08/18/89)
In article <1629@naucse.UUCP> jdc@naucse.UUCP (John Campbell) writes: >The new flex reads a large chunck at a >time. With VMS STREAM-LF files this works just fine--but with "normal" >VFC editor text files (darn these RMS things) the VMS 'C' rtl will only >return at most 1 record full of characters for any large read() byte >request. >During processing on a flex input file, flex complains of a "NULL in >input." This seems to be because yyunput() wants to "shift things up >to make room" and assumes that the end of the valid buffer is around >YY_BUF_SIZE deep. This sounds like a bug in flex. If I understand the complaint correctly, the code gets confused when the buffer is not (?) substantially full. (This sounds odd; code usually fails when buffers fill up, not when they stay relatively empty.) Flex should certainly be fixed to handle "short" reads. The set of conditions under which read() is guaranteed to return its third argument is much smaller than the set of exception cases -- those in which, though succeeding (neither error nor EOF) read returns fewer characters than requested. In fact, the set of "normal" cases has exactly one element: reads from disk files in which as many bytes as are requested exist between the current offset and end-of-file. This set can further be restricted to Unix systems (VMS and MS-DOS read emulations do not necessarily comply), and I wouldn't be surprised if there are distributed filesystems or other wrinkles existing under apparently pristine Unix variants which also cause the assumption to break down. The message is clear: never assume read() will return everything you ask for. This is usually straightforward, and I can't imagine why flex is having trouble with it. (flex is probably doing something wildly inappropriate in its input buffering strategy, doubtless out of efficiency concerns, which actually might be acceptable in a lexer, lexical analysis being a frequent bottleneck, but still no excuse for incorrect or unportable code.) Steve Summit scs@adam.pika.mit.edu >oriented file systems read() will, of course, only do this on the last >buffer.) > >So my problem is not that I can't understand what is causing the "NULL >in input" message, but a request for what the best solution might be. >I could, of course, create a special VMS YY_INPUT macro that fills >the buffer like a unix read() would using getc, or I could try to >patch yyunput() and hope that the read() behavior assumption is isolated >to this spot in flex's code. > >If you have an opinion on which way to go I'd like to hear it. If you >have already solved the problem I'd like to know what you did. If you >can think of a reason why this read() assumption is a bad idea for unix >(streams and producer/consumers that might not always behave like flex >is assuming) I'd like to know about that also. > >If I've been unclear and you want to know what the he-- I'm talking about >just mail me a message indicating where I was unclear. I'm going to try >hard to shelve this project for a day or so... >-- > John Campbell ...!arizona!naucse!jdc > CAMPBELL@NAUVAX.bitnet > unix? Sure send me a dozen, all different colors.
rgr@cbnewsm.ATT.COM (robert.g.robillard) (08/23/89)
Has there been a good solution developed to the "NULL in input" problem you get when you try to use Flex on VMS? If so, could somebody post it? (If it's been posted and I missed it, sorry about that. could somebody mail it to me?) Mucho Gracias -- | Duke Robillard | | Internet: rgr@m21ux.att.com | BITNET: rgr%m21ux.uucp@psuvax1 | | UUCP: {backbone!}att!m21ux!rgr | (maybe) |
jdc@naucse.UUCP (John Campbell) (08/26/89)
From article <13603@bloom-beacon.MIT.EDU>, by scs@adam.pika.mit.edu (Steve Summit):
: In article <1629@naucse.UUCP> jdc@naucse.UUCP (John Campbell) writes:
:>The new flex reads a large chunck at a
:>time. With VMS STREAM-LF files this works just fine--but with "normal"
:>VFC editor text files (darn these RMS things) the VMS 'C' rtl will only
:>return at most 1 record full of characters for any large read() byte
:>request.
:>During processing on a flex input file, flex complains of a "NULL in
:>input." This seems to be because yyunput() wants to "shift things up
:>to make room" and assumes that the end of the valid buffer is around
:>YY_BUF_SIZE deep.
:
: This sounds like a bug in flex. If I understand the complaint
: correctly, the code gets confused when the buffer is not (?)
: substantially full. (This sounds odd; code usually fails when
: buffers fill up, not when they stay relatively empty.)
I'm the original poster of the first article. To date I have not proven
that there is a bug in flex. I still believe this is true, but I haven't
had the time to make the obvious tests in other environments. I still
believe that yyunput() is in error in some way, but I would like to have
some follow up information before I bug Vern Paxson with my worries.
I can tell you that using VMS fread() instead of read() for initscan.c
seems to work just fine on all format of input files:
#define YY_INPUT(buf,result,max_size) \
if ( (result = fread(buf, 1, max_size, yyin)) == 0 ) \
if (ferror(yyin))\
YY_FATAL_ERROR( "fread() in flex scanner failed" );
I'm a little confused because, with no change to the yyunput() routine, I
can run one of my old programs which has to replace YY_INPUT() with my
own read routine to toss out '\0's. I find that this routine, which does
not try to fill up to max_size works for the regular expressions that
I analyze. At this point I assume there is something more complicated
going on in the lexer for flex itself than in the small ditty I wrote.
Anyway, if anyone would make some YY_INPUT() substitutions (like use
getc()) and see what happens when they recompile and test flex itself
we'd all be very happy. I work for a university that is about to start
classes so I'm overwhelmed right now.
--
John Campbell ...!arizona!naucse!jdc
CAMPBELL@NAUVAX.bitnet
unix? Sure send me a dozen, all different colors.
jct@jct.UUCP (jct) (08/30/89)
In article <1664@naucse.UUCP>, jdc@naucse.UUCP (John Campbell) writes: > From article <13603@bloom-beacon.MIT.EDU>, by scs@adam.pika.mit.edu (Steve Summit): > : In article <1629@naucse.UUCP> jdc@naucse.UUCP (John Campbell) writes: > :>The new flex reads a large chunck at a > :>time. With VMS STREAM-LF files this works just fine--but with "normal" > > Stuff deleted > > : This sounds like a bug in flex. If I understand the complaint > > I'm the original poster of the first article. To date I have not proven > that there is a bug in flex. I still believe this is true, but I haven't > had the time to make the obvious tests in other environments. I still > believe that yyunput() is in error in some way, but I would like to have > some follow up information before I bug Vern Paxson with my worries. > > Stuff deleted > I noticed the same problems. I traced it to a macro substitution occuring at the beginning of a read() buffer. The substitustion uses yyunput() to do the actual change. Since on VMS this is the beginning of a line it shows up more easily than on UNIX. I say its a flat out bug in yyunput(), I changed to the following (which is less CPU efficient but works): #ifdef __STDC__ void yyunput (int c, register char *yy_bp) #else void yyunput (c, yy_bp) int c; register char *yy_bp; #endif { register char *yy_cp = yy_c_buf_p; *yy_cp = yy_hold_char; /* undo effects of setting up yytext */ if (yy_cp < yy_ch_buf + 2) { /* need to shift up to make room */ register int number_to_move = yy_n_chars + 2; /* +2 for EOB chars */ register char *source = &yy_ch_buf [number_to_move]; register char *dest = source + 1; while (source > yy_ch_buf) *--dest = *--source; yy_cp++; yy_bp++; yy_n_chars++; if (yy_cp < yy_ch_buf + 2) YY_FATAL_ERROR ("flex scanner push-back overflow"); } if (yy_cp > yy_bp && yy_cp [-1] == '\n') yy_cp [-2] = '\n'; *--yy_cp = c; YY_DO_BEFORE_ACTION; /* set up yytext again */ } That is instead of moving the whole buffer up to the end, just move up to make room for 1 more character. I've had no problems since. John Tompkins occrsh!jct!jct