[comp.os.vms] Flex

jdc@naucse.UUCP (John Campbell) (05/11/88)

Well I just finished making flex (Fast lex from Vern Paxson) work on 
VMS.  The reason for this posting is to raise a 'C' question and to 
let people know flex will run on VMS.

The 'C' question:
  Flex has the following global line:

     FILE *yyin=stdin, *yyout=stdout;

  which does not work at compile time on VMS.  In other words, it appears
  the compiler does not treat stdin as a constant--it's value is known only
  at run-time.  (VMS stdio.h says "extern noshare FILE *stdin;.) To work 
  around this I had to concote a fake main():
     FILE *yyin, *yyout;
     main()
     {
        yyin = stdin; yyout = stdout;
  
  Question: Is my compiler deficient?  Is the initialization done in flex
  suppose to work in ANSI C?  
  
Flex on VMS:
  For those interested, the following changes were done to make flex work on
  VMS: 1) 2 macro names > 31 characters where changed, 2) some file names were
  corrected to fit the VMS file system, 3) the yyin problem mentioned above
  was worked around, 4) bzero was defined as OTS$MOVEC5, and 5) unlink() was
  replaced with delete().
  
  If there is enough interest I can post a SEARCH for VMS (300 lines) 
  indicating how the original was changed.  I'm afraid my port is only a 
  start toward folding VMS support back into the original.  Anyone wanting 
  to improve on my effort is more than welcome, but I fear the unix community
  may be less than sympathetic to those of us stuck on VMS :-).
  
MUCH THANKS TO VERN PAXSON, KEVIN GONG, VAN JACOBSON, ET.AL.!!!!!!  
-- 
	John Campbell               ...!arizona!naucse!jdc

	unix?  Sure send me a dozen, all different colors.

scjones@sdrc.UUCP (Larry Jones) (05/12/88)

In article <690@naucse.UUCP>, jdc@naucse.UUCP (John Campbell) writes:
> The 'C' question:
>   Flex has the following global line:
> 
>      FILE *yyin=stdin, *yyout=stdout;
> 
>   which does not work at compile time on VMS.  In other words, it appears
>   the compiler does not treat stdin as a constant--it's value is known only
>   at run-time.  (VMS stdio.h says "extern noshare FILE *stdin;.) To work 

According to the latest ANSI draft (and many of the previous ones), stdin and
friends are simply expressions and not necessarily constant expressions.  Thus,
they may not be used portably to initialize objects with static storage
duration.  So, the compiler's OK, flex is not maximally portable (as you
found out).

----
Larry Jones                         UUCP: ...!sdrc!scjones
SDRC                                AT&T: (513) 576-2070
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150
"When all else fails, read the directions."

wcs@skep2.ATT.COM (Bill.Stewart.<ho95c>) (05/20/88)

In article <278@sdrc.UUCP> scjones@sdrc.UUCP (Larry Jones) writes:
:In article <690@naucse.UUCP>, jdc@naucse.UUCP (John Campbell) writes:
:> The 'C' question: :>   Flex has the following global line:
:>      FILE *yyin=stdin, *yyout=stdout;
:>   which does not work at compile time on VMS.  In other words, it appears
:>   the compiler does not treat stdin as a constant--it's value is known only
:>   at run-time.  (VMS stdio.h says "extern noshare FILE *stdin;.) To work 
:
:According to the latest ANSI draft (and many of the previous ones), stdin and
:friends are simply expressions and not necessarily constant expressions.  Thus,
:they may not be used portably to initialize objects with static storage
:duration.  So, the compiler's OK, flex is not maximally portable (as you

If the compiler's OK, what's this "noshare" business?  It's been  a while
since I've seen the ANSI C draft, but I don't remember that being in it -
it looks kind of like noalias?  (Noalias had to go, and did - it was
non-negotiable.)  Also, something you declare to be extern shouldn't be
assumed to be constant.  So both are somewhat non-portable.
-- 
#				Thanks;
# Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs
# skep2 is a local machine I'm trying to turn into a server.  Please send
# mail to ho95c or ho95e instead.  Thanks.

LEICHTER@VENUS.YCC.YALE.EDU ("Jerry Leichter ", LEICHTER-JERRY@CS.YALE.EDU) (05/24/88)

	:> The 'C' question: :>   Flex has the following global line:
	:>      FILE *yyin=stdin, *yyout=stdout;
	:>   which does not work at compile time on VMS.  In other words, it
	:>   appears the compiler does not treat stdin as a constant--it's
	:>   value is known only at run-time.  (VMS stdio.h says
	:>   "extern noshare FILE *stdin;.) To work 
	:
	:According to the latest ANSI draft (and many of the previous ones),
	:stdin and friends are simply expressions and not necessarily
	:constant expressions.  Thus, they may not be used portably to
	:initialize objects with static storage duration.  So, the compiler's
	:OK, flex is not maximally portable (as you

	If the compiler's OK, what's this "noshare" business?  It's been  a
	while since I've seen the ANSI C draft, but I don't remember that
	being in it - it looks kind of like noalias?  (Noalias had to go, and
	did - it was non-negotiable.)  Also, something you declare to be
	extern shouldn't be assumed to be constant.  So both are somewhat
	non-portable.

"noshare" is a VAX C extension to control storage allocation; it is mainly
needed when creating shared images --- it says that the variable is to go
into a "copy on reference" segment, rather than being shared among all users
of the shared image.  Since the VAX C run-time library, including all the code
for such things as printf, normally lives in a shared library (VAXCRTL.EXE),
you can see that stdio had better be "noshare" unless you want everyone on
the system writing to the same standard output!

"noshare" pre-dates ANSI, which it is certainly not compatible with.  It could
presumably be replaced with a #pragma or something like "__noshare" to be
compatible.

Whether it can appear in the declaration of stdin and friends is an interes-
ting question.  What it comes down to is:  Does the ANSI spec guarantee a
particular declaration syntax for these things, or does it simply guarantee
that some sort of appropriate - but system-specific - definition will be
available if stdio.h is included.
							-- Jerry

jdc@naucse.UUCP (John Campbell) (08/04/89)

The latest flex was posted a month or so ago by Vern Paxson.  (Flex is
a faster lex that will be replacing most older lexes on many unix systems.)
I'm working on bringing it up on VMS and I've run into the following
situation...

The "older" flex read a line at a time (YY_MAX_LINE) and YY_INPUT worked
pretty much without a hitch.  The new flex reads a large chunck at a
time.  With VMS STREAM-LF files this works just fine--but with "normal"
VFC editor text files (darn these RMS things) the VMS 'C' rtl will only
return at most 1 record full of characters for any large read() byte
request.

During processing on a flex input file, flex complains of a "NULL in
input."  This seems to be because yyunput() wants to "shift things up
to make room" and assumes that the end of the valid buffer is around
YY_BUF_SIZE deep.  On VMS, with it's non-standard read() behavior, the
end of the valid data is frequently not near YY_BUF_SIZE.  (On non-record
oriented file systems read() will, of course, only do this on the last
buffer.)  

So my problem is not that I can't understand what is causing the "NULL
in input" message, but a request for what the best solution might be.
I could, of course, create a special VMS YY_INPUT macro that fills
the buffer like a unix read() would using getc, or I could try to
patch yyunput() and hope that the read() behavior assumption is isolated
to this spot in flex's code.  

If you have an opinion on which way to go I'd like to hear it.  If you
have already solved the problem I'd like to know what you did.  If you
can think of a reason why this read() assumption is a bad idea for unix
(streams and producer/consumers that might not always behave like flex
is assuming) I'd like to know about that also.

If I've been unclear and you want to know what the he-- I'm talking about
just mail me a message indicating where I was unclear.  I'm going to try
hard to shelve this project for a day or so...
-- 
	John Campbell               ...!arizona!naucse!jdc
                                    CAMPBELL@NAUVAX.bitnet
	unix?  Sure send me a dozen, all different colors.

scs@adam.pika.mit.edu (Steve Summit) (08/18/89)

In article <1629@naucse.UUCP> jdc@naucse.UUCP (John Campbell) writes:
>The new flex reads a large chunck at a
>time.  With VMS STREAM-LF files this works just fine--but with "normal"
>VFC editor text files (darn these RMS things) the VMS 'C' rtl will only
>return at most 1 record full of characters for any large read() byte
>request.
>During processing on a flex input file, flex complains of a "NULL in
>input."  This seems to be because yyunput() wants to "shift things up
>to make room" and assumes that the end of the valid buffer is around
>YY_BUF_SIZE deep.

This sounds like a bug in flex.  If I understand the complaint
correctly, the code gets confused when the buffer is not (?)
substantially full.  (This sounds odd; code usually fails when
buffers fill up, not when they stay relatively empty.)

Flex should certainly be fixed to handle "short" reads.  The set
of conditions under which read() is guaranteed to return its
third argument is much smaller than the set of exception cases --
those in which, though succeeding (neither error nor EOF) read
returns fewer characters than requested.  In fact, the set of
"normal" cases has exactly one element: reads from disk files in
which as many bytes as are requested exist between the current
offset and end-of-file.  This set can further be restricted to
Unix systems (VMS and MS-DOS read emulations do not necessarily
comply), and I wouldn't be surprised if there are distributed
filesystems or other wrinkles existing under apparently pristine
Unix variants which also cause the assumption to break down.

The message is clear: never assume read() will return everything
you ask for.  This is usually straightforward, and I can't
imagine why flex is having trouble with it.  (flex is probably
doing something wildly inappropriate in its input buffering
strategy, doubtless out of efficiency concerns, which actually
might be acceptable in a lexer, lexical analysis being a frequent
bottleneck, but still no excuse for incorrect or unportable code.)

                                            Steve Summit
                                            scs@adam.pika.mit.edu


>oriented file systems read() will, of course, only do this on the last
>buffer.)  
>
>So my problem is not that I can't understand what is causing the "NULL
>in input" message, but a request for what the best solution might be.
>I could, of course, create a special VMS YY_INPUT macro that fills
>the buffer like a unix read() would using getc, or I could try to
>patch yyunput() and hope that the read() behavior assumption is isolated
>to this spot in flex's code.  
>
>If you have an opinion on which way to go I'd like to hear it.  If you
>have already solved the problem I'd like to know what you did.  If you
>can think of a reason why this read() assumption is a bad idea for unix
>(streams and producer/consumers that might not always behave like flex
>is assuming) I'd like to know about that also.
>
>If I've been unclear and you want to know what the he-- I'm talking about
>just mail me a message indicating where I was unclear.  I'm going to try
>hard to shelve this project for a day or so...
>-- 
>	John Campbell               ...!arizona!naucse!jdc
>                                    CAMPBELL@NAUVAX.bitnet
>	unix?  Sure send me a dozen, all different colors.

rgr@cbnewsm.ATT.COM (robert.g.robillard) (08/23/89)

Has there been a good solution developed to the "NULL in input"
problem you get when you try to use Flex on VMS?  If so, could
somebody post it?  (If it's been posted and I missed it, sorry
about that.  could somebody mail it to me?)

Mucho Gracias
-- 
|  Duke Robillard                                                        |
|  Internet: rgr@m21ux.att.com         | BITNET: rgr%m21ux.uucp@psuvax1  |
|  UUCP:     {backbone!}att!m21ux!rgr  |          (maybe)                |

jdc@naucse.UUCP (John Campbell) (08/26/89)

From article <13603@bloom-beacon.MIT.EDU>, by scs@adam.pika.mit.edu (Steve Summit):
: In article <1629@naucse.UUCP> jdc@naucse.UUCP (John Campbell) writes:
:>The new flex reads a large chunck at a
:>time.  With VMS STREAM-LF files this works just fine--but with "normal"
:>VFC editor text files (darn these RMS things) the VMS 'C' rtl will only
:>return at most 1 record full of characters for any large read() byte
:>request.
:>During processing on a flex input file, flex complains of a "NULL in
:>input."  This seems to be because yyunput() wants to "shift things up
:>to make room" and assumes that the end of the valid buffer is around
:>YY_BUF_SIZE deep.
: 
: This sounds like a bug in flex.  If I understand the complaint
: correctly, the code gets confused when the buffer is not (?)
: substantially full.  (This sounds odd; code usually fails when
: buffers fill up, not when they stay relatively empty.)

I'm the original poster of the first article.  To date I have not proven
that there is a bug in flex.  I still believe this is true, but I haven't
had the time to make the obvious tests in other environments.  I still
believe that yyunput() is in error in some way, but I would like to have
some follow up information before I bug Vern Paxson with my worries.

I can tell you that using VMS fread() instead of read() for initscan.c
seems to work just fine on all format of input files:  

#define YY_INPUT(buf,result,max_size) \
        if ( (result = fread(buf, 1, max_size, yyin)) == 0 ) \
            if (ferror(yyin))\
               YY_FATAL_ERROR( "fread() in flex scanner failed" );


I'm a little confused because, with no change to the yyunput() routine, I 
can run one of my old programs which has to replace YY_INPUT() with my
own read routine to toss out '\0's.  I find that this routine, which does
not try to fill up to max_size works for the regular expressions that
I analyze.  At this point I assume there is something more complicated
going on in the lexer for flex itself than in the small ditty I wrote.

Anyway, if anyone would make some YY_INPUT() substitutions (like use
getc()) and see what happens when they recompile and test flex itself
we'd all be very happy.  I work for a university that is about to start
classes so I'm overwhelmed right now.

-- 
	John Campbell               ...!arizona!naucse!jdc
                                    CAMPBELL@NAUVAX.bitnet
	unix?  Sure send me a dozen, all different colors.

jct@jct.UUCP (jct) (08/30/89)

In article <1664@naucse.UUCP>, jdc@naucse.UUCP (John Campbell) writes:
> From article <13603@bloom-beacon.MIT.EDU>, by scs@adam.pika.mit.edu (Steve Summit):
> : In article <1629@naucse.UUCP> jdc@naucse.UUCP (John Campbell) writes:
> :>The new flex reads a large chunck at a
> :>time.  With VMS STREAM-LF files this works just fine--but with "normal"
>
> Stuff deleted
>
> : This sounds like a bug in flex.  If I understand the complaint
> 
> I'm the original poster of the first article.  To date I have not proven
> that there is a bug in flex.  I still believe this is true, but I haven't
> had the time to make the obvious tests in other environments.  I still
> believe that yyunput() is in error in some way, but I would like to have
> some follow up information before I bug Vern Paxson with my worries.
> 
> Stuff deleted
>

I noticed the same problems. I traced it to a macro substitution occuring
at the beginning of a read() buffer. The substitustion uses yyunput() to do
the actual change. Since on VMS this is the beginning of a line it shows up 
more easily than on UNIX. I say its a flat out bug in yyunput(), I changed 
to the following (which is less CPU efficient but works):

#ifdef __STDC__
void yyunput (int c, register char *yy_bp)
#else
void yyunput (c, yy_bp)
  int           c;
  register char *yy_bp;
#endif
{
  register char *yy_cp = yy_c_buf_p;

  *yy_cp = yy_hold_char;		/* undo effects of setting up yytext */
  if (yy_cp < yy_ch_buf + 2)
    { 					/* need to shift up to make room */
      register int number_to_move = yy_n_chars + 2;	/* +2 for EOB chars */
      register char *source = &yy_ch_buf [number_to_move];
      register char *dest = source + 1;

      while (source > yy_ch_buf)
	*--dest = *--source;
      yy_cp++;
      yy_bp++;
      yy_n_chars++;
      if (yy_cp < yy_ch_buf + 2)
	YY_FATAL_ERROR ("flex scanner push-back overflow");
    }
  if (yy_cp > yy_bp && yy_cp [-1] == '\n')
    yy_cp [-2] = '\n';
  *--yy_cp = c;
  YY_DO_BEFORE_ACTION;			/* set up yytext again */
}

That is instead of moving the whole buffer up to the end, just move up
to make room for 1 more character. I've had no problems since.

John Tompkins
occrsh!jct!jct