[gnu.utils.bug] bug in Gnu e?grep / regex.c

tarvaine@tukki.jyu.fi (Tapani Tarvainen) (06/17/89)

I tried porting Gnu e?grep to MS-DOS (Turbo C 2.0).
In the process I found something of a bug, or at least
a piece of not-so-portable code in regex.c.

The program compiled easily, with only a few trivial modifications
like different include files and specifying stack size - in only one
place I changed actual code (in displaying usage it assumes directory
separator is /).  And of course makefile had to be changed rather
drastically to get Borland's make digest it.

At first it worked fine, until I tried a rather complicated regexp and
got "Memory exhausted".  Well, I recompiled it with -mc (compact
memory model, i.e., far data pointers).  After which it still gave
"Memory exhausted" for just about anything but fixed strings,
regardless of how much memory was available.

I traced the problem to the following macro, used in function
re_compile_pattern in regex.c:

#define EXTEND_BUFFER \
  { char *old_buffer = bufp->buffer; \
    if (bufp->allocated == (1<<16)) goto too_big; \
    bufp->allocated *= 2; \
    if (bufp->allocated > (1<<16)) bufp->allocated = (1<<16); \
    if (!(bufp->buffer = (char *) realloc (bufp->buffer, bufp->allocated))) \
      goto memory_exhausted; \
    c = bufp->buffer - old_buffer; \
    b += c; \
    if (fixup_jump) \
      fixup_jump += c; \
    if (laststart) \
      laststart += c; \
    begalt += c; \
    if (pending_exact) \
      pending_exact += c; \
  }

What do you think a stupid compiler with 16-bit ints makes out of an
expression like 1<<16?  Right, zero.  I substituted 1L<<16 (what would
be the aesthetically correct form?)  and changed the definition of
allocated in struct re_pattern_buffer in regex.h from int to long, and
the problem disappeared.  (BTW, is there some machine where this could
harm anything?  I mean, both ints and longs are 32 bits in 32-bit
machines anyway, aren't they?)

So far so good.  But then I tried some even more complicated regexps
and - the machine crashed.  Oh well, debugger out again, and so it
turned out the problem was again in the above macro.
Look at this piece of code:

    c = bufp->buffer - old_buffer; 
    b += c; 

Pointer subtraction is only guaranteed to work when the pointers point
to the same structure, which is not the case here.  And indeed, in
80x86 large memory model pointer subtraction is done by subtracting
offsets only, which is OK as long as individual structures are <64K,
*as long as the segments are same*.  And here they may not be.

Using huge pointers would solve the problem but waste time,
and I wanted a portable (and standard-conforming) solution.
This one seems to fit the bill:


#define EXTEND_BUFFER \
  { char *old_buffer = bufp->buffer; \
    if (bufp->allocated == (1L<<16)) goto too_big; \
    bufp->allocated *= 2; \
    if (bufp->allocated > (1L<<16)) bufp->allocated = (1L<<16); \
    if (!(bufp->buffer = (char *) realloc (bufp->buffer, bufp->allocated))) \
      goto memory_exhausted; \
    c = b - old_buffer; \
    b = bufp->buffer + c; \
    if (fixup_jump) { \
      c = fixup_jump - old_buffer; \
      fixup_jump = bufp->buffer + c; \
    } \
    if (laststart) { \
      c = laststart - old_buffer; \
      laststart = bufp->buffer + c; \
    } \
    c = begalt - old_buffer; \
    begalt = bufp->buffer + c; \
    if (pending_exact) { \
      c = pending_exact - old_buffer; \
      pending_exact = bufp->buffer + c; \
    } \
  }


I *think*

    b = bufp->buffer + (b - old_buffer); 

etc should also work, but some compiler might rearrange it as

    b = (bufp->buffer - old_buffer) + b;

which again would fail.  Anyway, a decent compiler (like gcc) should
produce as good code either way.


-- 
Tapani Tarvainen                 BitNet:    tarvainen@finjyu
Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi

tarvaine@tukki.jyu.fi (Tapani Tarvainen) (06/18/89)

In article <920@tukki.jyu.fi> I wrote about problems in porting
Gnu e?grep to MS-DOS.  It turned out I hadn't found them all:
It still failed with certain regexps.  It didn't crash, just gave
wrong results, making the bug much harder to track down.

At this point it fortunately occurred me that using large data model
wasn't really necessary after all - the "Memory exhausted" problem
with small model was only the result of the erroneous test described
in my previous posting.  And indeed, after compiling with small model 
I got perfect results every time.  So I could hunt the bug in large
data model by tracing the versions side by side.

The problem was in regex.c, which presumably has been written when
compilers that violate its assumptions were but a dream in some
ANSI-committee-member-to-be's mind.  Actually, there were two
related problems:

One, the code assumes that the result of subtracting pointers is
an int.  According to pANS as well as TurboC, it is ptrdiff_t --
and in large data models the latter defines it as long.

Two, occasionally 0 (zero) is passed as a pointer argument without
cast, which I think is valid only when ints and pointers are of the
same size and the representation of null pointer is (int)0.
(If NULL had been used, it would've worked with Turbo C as it
defines NULL as 0L in large data models, but that is something
of a kludge, and might fail with some other compiler.)

Both had the effect of passing 4-byte values to functions
expecting 2-byte ones, with predictable results.

To fix these I changed most ints in regex.c and a few in grep.c
into ptrdiff_t and added casts where necessary (and probably some
that strictly aren't), including ptrdiff_t type constants.

To maintain compatibility with pre-ANSI compilers I added
#ifndef __STDC__
  typedef int ptrdiff_t;
#endif

It is perhaps worth noting that with ANSI-style prototypes
these problems would not have occurred in the first place.

I _think_ there are no more bugs left, but I'll experiment a little
yet to make sure.

By the way, what is the recommended place for sending ports of Gnu
stuff like this - I mean if there are no bugs (for which this group is
for), only things like different #include's or makefile?  Where should
I send the diffs for this & TC makefile once I'm satisfied with the
thing?

-- 
Tapani Tarvainen                 BitNet:    tarvainen@finjyu
Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi

tarvaine@tukki.jyu.fi (Tapani Tarvainen) (06/23/89)

Here are the patches I've made to get Gnu e?grep 1.3 compile
with Turbo C 2.0 (and run after it ...).
Most of the changes are obvious (changed #includes &c)
and isolated with #ifdef __TURBOC__ or #ifdef __MSDOS__.
Otherwise the only change that really had to be made was
changing 1<<16 in EXTEND_BUFFER to 1L<<16 (see previous
posting), however I made a number of other changes to
make this compile with large data models also (even though
there isn't really much reason to do that): a number of
ints have been changed to ptrdiff_t and some casts and
declarations added.  One of them perhaps deserves an extra 
comment, from getopt.c:

  {
    char c = *nextchar++;
    char *temp = (char *) index (optstring, c);

What is that cast doing there, index() is of type (char *)
already, no?  Its only effect is to remove a compiler/lint
warning ... which would have been necessary: index() isn't
declared, so the compiler thinks it returns an int, and when
ints are 16 and pointers 32 bits ... you get the idea.
I added the declaration and removed the cast (I think this
is a good example of how casts should NOT be used).

The definition of EXTEND_BUFFER in regex.c here is different
from the one I posted earlier.  It has been brought to my
attention since that systems may exist where it still wouldn't
work:  systems may exist where subtracting pointers to freed
areas cannot be done at all ... this one avoids that problem.

I'm still not perfectly happy with this.  Despite the effort
to make it compile under large data models, it still wouldn't
work too well with them, if the buffer grows so big that using 
them would actually be necessary; actually I suspect it may
crash if the buffer exceeds 32K.  -- As I am writing this I
realize it almost certainly will, EXTEND_BUFFER isn't foolproof
still: bufp->allocated, now a long, is passed to realloc()
which expects size_t. TurboC's include files with prototypes
prevent this from causing trouble, _as long as it fits_ -
but when it doesn't, truncating it may have all kinds of odd
effects.  Hmmm ... actually it always will, the test against
1L<<16 ensures that ... except when it becomes exactly 2^16.
There are also some things which may fail if the regex or a source
line is longer than 32K.  I don't have time to investigate this
further right now (for a week at least :->), but I may return to it
later; if anybody can come up with an example which the small model
version can't handle, I would be very interest in seeing it.

In any event it should still work everywhere it worked before (only
EXTEND_BUFFER should produce different code), that is in machines with
sizeof(int)==sizeof(pointer) and flat address space.

In addition to the changes needed to make it work, as MS-DOS lacks
standard help system I added a piece of built-in documentation:
compile with LONGHELP defined (my tcc makefile does) and the one-line
"usage:"-message is replaced with a screenfull summarizing options &
regex special chars.

Incidentally, I tried compiling this with Microsoft C 5.0 too
(which apparently is full of bugs, I guess I should get 5.1):
the definition of EXTEND_BUFFER was too long for it,
and it actually crashed with dfa.c.

OK, enough blathering: feed this to patch(1) and the result should
compile without further trouble.  Oh yes, the makefile is meant
for Borland's MAKE, I don't know if it'll work with others.


diff -c ./alloca.c tc/alloca.c
*** ./alloca.c	Fri Mar  3 12:44:50 1989
--- tc/alloca.c	Fri Jun 23 16:19:28 1989
***************
*** 26,31 ****
--- 26,35 ----
  static char	SCCSid[] = "@(#)alloca.c	1.1";	/* for the "what" utility */
  #endif
  
+ #ifdef __MSDOS__
+ #define STACK_DIRECTION -1
+ #endif
+ 
  #ifdef emacs
  #include "config.h"
  #ifdef static
diff -c ./dfa.c tc/dfa.c
*** ./dfa.c	Fri Mar  3 13:46:34 1989
--- tc/dfa.c	Fri Jun 23 16:19:29 1989
***************
*** 2220,2223 ****
  		ifree(mp[i].is);
  	}
  	free((char *) mp);
- }
--- 2220,2222 ----
diff -c ./dfa.h tc/dfa.h
*** ./dfa.h	Fri Mar  3 16:53:14 1989
--- tc/dfa.h	Fri Jun 23 16:19:30 1989
***************
*** 1,6 ****
  /* dfa.h - declarations for GNU deterministic regexp compiler
!    Copyright (C) 1988 Free Software Foundation, Inc.
                        Written June, 1988 by Mike Haertel
  
  		       NO WARRANTY
  
--- 1,7 ----
  /* dfa.h - declarations for GNU deterministic regexp compiler
!    Copyright (C) 1988, 1989 Free Software Foundation, Inc.
                        Written June, 1988 by Mike Haertel
+ 		      TurboC mods June, 1989 by Tapani Tarvainen
  
  		       NO WARRANTY
  
***************
*** 103,108 ****
--- 104,112 ----
  You are forbidden to forbid anyone else to use, share and improve
  what you give them.   Help stamp out software-hoarding!  */
  
+ #ifdef __TURBOC__
+ #define USG
+ #endif
  
  #ifdef USG
  #include <string.h>
***************
*** 113,126 ****
  #endif
  
  #ifdef __STDC__
- 
  /* Missing include files for GNU C. */
  /* #include <stdlib.h> */
  typedef int size_t;
  extern void *calloc(int, size_t);
  extern void *malloc(size_t);
  extern void *realloc(void *, size_t);
  extern void free(void *);
  
  extern char *bcopy(), *bzero();
  
--- 117,131 ----
  #endif
  
  #ifdef __STDC__
  /* Missing include files for GNU C. */
  /* #include <stdlib.h> */
+ #ifndef __TURBOC__
  typedef int size_t;
  extern void *calloc(int, size_t);
  extern void *malloc(size_t);
  extern void *realloc(void *, size_t);
  extern void free(void *);
+ #endif
  
  extern char *bcopy(), *bzero();
  
diff -c ./getopt.c tc/getopt.c
*** ./getopt.c	Fri Mar  3 12:44:54 1989
--- tc/getopt.c	Fri Jun 23 16:19:30 1989
***************
*** 1,6 ****
  /* Getopt for GNU.
!    Copyright (C) 1987 Free Software Foundation, Inc.
  
  		       NO WARRANTY
  
    BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY
--- 1,8 ----
  /* Getopt for GNU.
!    Copyright (C) 1987, 1989 Free Software Foundation, Inc.
! 			MS-DOS/TurboC mods June 1989 by Tapani Tarvainen
  
+ 
  		       NO WARRANTY
  
    BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY
***************
*** 108,119 ****
--- 110,129 ----
     GNU application programs can use a third alternative mode in which
     they can distinguish the relative order of options and other arguments.  */
  
+ 
  #include <stdio.h>
  
+ #ifdef __TURBOC__
+ #define USG
+ #include <stdlib.h>
+ #include <string.h>
+ void * alloca (unsigned);
+ #endif
  #ifdef sparc
  #include <alloca.h>
  #endif
  #ifdef USG
+ extern char * index ();
  #define bcopy(s, d, l) memcpy((d), (s), (l))
  #endif
  
***************
*** 358,364 ****
  
    {
      char c = *nextchar++;
!     char *temp = (char *) index (optstring, c);
  
      /* Increment `optind' when we start to process its last character.  */
      if (*nextchar == 0)
--- 368,374 ----
  
    {
      char c = *nextchar++;
!     char *temp = index (optstring, c);
  
      /* Increment `optind' when we start to process its last character.  */
      if (*nextchar == 0)
diff -c ./grep.c tc/grep.c
*** ./grep.c	Fri Mar  3 18:05:52 1989
--- tc/grep.c	Fri Jun 23 16:19:31 1989
***************
*** 1,8 ****
  /* grep - print lines matching an extended regular expression
!    Copyright (C) 1988 Free Software Foundation, Inc.
                        Written June, 1988 by Mike Haertel
  	              BMG speedups added July, 1988
  			by James A. Woods and Arthur David Olson
  
  		       NO WARRANTY
  
--- 1,9 ----
  /* grep - print lines matching an extended regular expression
!    Copyright (C) 1988, 1989 Free Software Foundation, Inc.
                        Written June, 1988 by Mike Haertel
  	              BMG speedups added July, 1988
  			by James A. Woods and Arthur David Olson
+ 		      MS-DOS/TurboC mods by Tapani Tarvainen June, 1989
  
  		       NO WARRANTY
  
***************
*** 104,118 ****
  In other words, you are welcome to use, share and improve this program.
  You are forbidden to forbid anyone else to use, share and improve
  what you give them.   Help stamp out software-hoarding!  */
  
  #include <ctype.h>
  #include <stdio.h>
  #ifdef USG
- #include <memory.h>
  #include <string.h>
! #else
  #include <strings.h>
! #endif
  #include "dfa.h"
  #include "regex.h"
  
--- 105,134 ----
  In other words, you are welcome to use, share and improve this program.
  You are forbidden to forbid anyone else to use, share and improve
  what you give them.   Help stamp out software-hoarding!  */
+ 
  
+ 
+ #ifdef __TURBOC__
+ #define USG
+ #endif
+ 
  #include <ctype.h>
  #include <stdio.h>
+ 
  #ifdef USG
  #include <string.h>
! #ifdef __TURBOC__
! #include <stdlib.h>
! #include <alloc.h>
! #include <mem.h>
! unsigned _stklen = 20000;
! #else !__TURBOC__
! #include <memory.h>
! #endif __TURBOC__
! #else !USG
  #include <strings.h>
! #endif USG
! 
  #include "dfa.h"
  #include "regex.h"
  
***************
*** 298,310 ****
  static
  grep()
  {
!   int retain = 0;		/* Number of bytes to retain on next call
  				   to fill_buffer_retaining(). */
    char *search_limit;		/* Pointer to the character after the last
  				   newline in the buffer. */
    char saved_char;		/* Character after the last newline. */
    char *resume;			/* Pointer to where to resume search. */
!   int resume_index = 0;		/* Count of characters to ignore after
  				   refilling the buffer. */
    int line_count = 1;		/* Line number. */
    int try_backref;		/* Set to true if we need to verify the
--- 314,326 ----
  static
  grep()
  {
!   ptrdiff_t retain = 0;		/* Number of bytes to retain on next call
  				   to fill_buffer_retaining(). */
    char *search_limit;		/* Pointer to the character after the last
  				   newline in the buffer. */
    char saved_char;		/* Character after the last newline. */
    char *resume;			/* Pointer to where to resume search. */
!   ptrdiff_t resume_index = 0;	/* Count of characters to ignore after
  				   refilling the buffer. */
    int line_count = 1;		/* Line number. */
    int try_backref;		/* Set to true if we need to verify the
***************
*** 369,375 ****
  	     a backtracking matcher to make sure the line is a match. */
  	  if (try_backref && re_search(&regex, matching_line,
  				       next_line - matching_line - 1,
! 				       0,
  				       next_line - matching_line - 1,
  				       NULL) < 0)
  	    {
--- 385,391 ----
  	     a backtracking matcher to make sure the line is a match. */
  	  if (try_backref && re_search(&regex, matching_line,
  				       next_line - matching_line - 1,
! 				       (ptrdiff_t)0,
  				       next_line - matching_line - 1,
  				       NULL) < 0)
  	    {
***************
*** 537,544 ****
  usage_and_die()
  {
    fprintf(stderr,
! "usage: %s [-CVbchilnsvwx] [-<num>] [-AB <num>] [-f file] [-e] expr [files]\n",
!           prog);
    exit(ERROR);
  }
  
--- 553,596 ----
  usage_and_die()
  {
    fprintf(stderr,
! "usage: %s [-CVbchilnsvwx] [-<num>] [-AB <num>] [-f file] [-e] expr [files]\n"
! #ifdef LONGHELP
! /* this assumes compiler merges adjacent strings */
! "\n-A <num>  context after match\t\t-h\tdon't display filenames\n"
! "-B <num>  context before match\t\t-i\tignore case\n"
! "-<num>\t  context on each side\t\t-l\tlist files only\n"
! "-V\t  version number\t\t-n\tline numbers\n"
! "-b\t  byte offsets\t\t\t-s\trun silently\n"
! "-c\t  total count only\t\t-v\tnon-matching lines only\n"
! "-e <expr> search for <expr>\t\t-w\tmatch only complete words\n"
! "-f <file> take <expr> from <file>\t-x\tmatch only whole lines\n\n"
! "In the regular expression:\n"
! ".\tany single character\t\t^\tbeginning of line\n"
! #ifndef EGREP
! "\\"
! #endif
! "?\trepeat 0 or 1 times\t\t$\tend of line\n"
! "*\trepeat 0 or more times\t\t\\<\tbeginning of word\n"
! #ifndef EGREP
! "\\"
! #endif
! "+\trepeat 1 or more times\t\t\\>\tend of word\n"
! "[ ]\tcharacter set, [^ ] complement\t"
! #ifdef EGREP
! "( )"
! #else
! "\\( \\)"
! #endif
! "\tgrouping\n"
! #ifndef EGREP
! "\\"
! #endif
! "|\tOR\t\t\t\t\\<n> \ttext inside <n>th parentheses\n"
! "\\w\t[a-zA-Z0-9]\t\t\t\\b\tat the edge of a word\n"
! "\\W\t[^a-zA-Z0-9]\t\t\t\\B\tnot at the edge of a word\n"
! "\\\tliteralize following special character\n"
! #endif
! 	  ,prog);
    exit(ERROR);
  }
  
***************
*** 562,570 ****
--- 614,632 ----
    char *regex_errmesg;		/* Error message from regex routines. */
    char translate[_NOTCHAR];	/* Translate table for case conversion
  				   (needed by the backtracking matcher). */
+   int bmg_setup ();		/* keep lint happy */
  
+ #ifdef __MSDOS__
+   if (prog = strrchr(argv[0], '\\')) {
+     char *p;
+     ++prog;
+     if (p = strrchr(prog, '.'))
+ 	*p = 0;
+   }
+ #else
    if (prog = strrchr(argv[0], '/'))
      ++prog;
+ #endif
    else
      prog = argv[0];
  
***************
*** 754,760 ****
    
    if (regex_errmesg = re_compile_pattern(the_regexp, regexp_len, &regex))
      regerror(regex_errmesg);
!   
    /*
      Find the longest metacharacter-free string which must occur in the
      regexpr, before short-circuiting regexecute() with Boyer-Moore-Gosper.
--- 816,822 ----
    
    if (regex_errmesg = re_compile_pattern(the_regexp, regexp_len, &regex))
      regerror(regex_errmesg);
! 
    /*
      Find the longest metacharacter-free string which must occur in the
      regexpr, before short-circuiting regexecute() with Boyer-Moore-Gosper.
***************
*** 860,866 ****
    char *match;
    char *start = begin;
    char save;			/* regexecute() sentinel */
!   int len;
    char *bmg_search();
  
    if (!bmgexec)			/* full automaton search */
--- 922,928 ----
    char *match;
    char *start = begin;
    char save;			/* regexecute() sentinel */
!   ptrdiff_t len;
    char *bmg_search();
  
    if (!bmgexec)			/* full automaton search */
***************
*** 867,873 ****
      return(regexecute(r, begin, end, newline, count, try_backref));
    else
      {
!       len = end - begin; 
        while ((match = bmg_search((unsigned char *) start, len)) != NULL)
  	{
  	  p = match;		/* narrow search range to submatch line */
--- 929,935 ----
      return(regexecute(r, begin, end, newline, count, try_backref));
    else
      {
!       len = end - begin;
        while ((match = bmg_search((unsigned char *) start, len)) != NULL)
  	{
  	  p = match;		/* narrow search range to submatch line */
diff -c ./regex.c tc/regex.c
*** ./regex.c	Fri Mar  3 12:44:58 1989
--- tc/regex.c	Fri Jun 23 16:19:32 1989
***************
*** 1,5 ****
--- 1,6 ----
  /* Extended regular expression matching and search library.
     Copyright (C) 1985, 1989 Free Software Foundation, Inc.
+ 			MS-DOS/TurboC mods June, 1989 by Tapani Tarvainen
  
     This program is free software; you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
***************
*** 38,43 ****
--- 39,52 ----
  
  #else  /* not emacs */
  
+ #ifdef __TURBOC__
+ #define USG
+ #include <stdlib.h>
+ #include <alloc.h>
+ #include <string.h>
+ void * alloca (unsigned);
+ #endif
+ 
  #ifdef USG
  #define bcopy(s,d,n)	memcpy((d),(s),(n))
  #define bcmp(s1,s2,n)	memcmp((s1),(s2),(n))
***************
*** 164,184 ****
  #define PATUNFETCH p--
  
  #define EXTEND_BUFFER \
!   { char *old_buffer = bufp->buffer; \
!     if (bufp->allocated == (1<<16)) goto too_big; \
      bufp->allocated *= 2; \
!     if (bufp->allocated > (1<<16)) bufp->allocated = (1<<16); \
      if (!(bufp->buffer = (char *) realloc (bufp->buffer, bufp->allocated))) \
        goto memory_exhausted; \
!     c = bufp->buffer - old_buffer; \
!     b += c; \
      if (fixup_jump) \
!       fixup_jump += c; \
      if (laststart) \
!       laststart += c; \
!     begalt += c; \
      if (pending_exact) \
!       pending_exact += c; \
    }
  
  static int store_jump (), insert_jump ();
--- 173,196 ----
  #define PATUNFETCH p--
  
  #define EXTEND_BUFFER \
!   { ptrdiff_t b_ofs = b - bufp->buffer, \
!     fixup_jump_ofs = fixup_jump - bufp->buffer, \
!     laststart_ofs = laststart - bufp->buffer, \
!     begalt_ofs = begalt - bufp->buffer, \
!     pending_exact_ofs = pending_exact - bufp->buffer; \
!     if (bufp->allocated == (1L<<16)) goto too_big; \
      bufp->allocated *= 2; \
!     if (bufp->allocated > (1L<<16)) bufp->allocated = (1L<<16); \
      if (!(bufp->buffer = (char *) realloc (bufp->buffer, bufp->allocated))) \
        goto memory_exhausted; \
!     b = bufp->buffer + b_ofs; \
      if (fixup_jump) \
!       fixup_jump = bufp->buffer + fixup_jump_ofs; \
      if (laststart) \
!       laststart = bufp->buffer + laststart_ofs; \
!     begalt = bufp->buffer + begalt_ofs; \
      if (pending_exact) \
!       pending_exact = bufp->buffer + pending_exact_ofs; \
    }
  
  static int store_jump (), insert_jump ();
***************
*** 199,205 ****
    /* address of the count-byte of the most recently inserted "exactn" command.
      This makes it possible to tell whether a new exact-match character
      can be added to that command or requires a new "exactn" command. */
!      
    char *pending_exact = 0;
  
    /* address of the place where a forward-jump should go
--- 211,217 ----
    /* address of the count-byte of the most recently inserted "exactn" command.
      This makes it possible to tell whether a new exact-match character
      can be added to that command or requires a new "exactn" command. */
! 
    char *pending_exact = 0;
  
    /* address of the place where a forward-jump should go
***************
*** 706,712 ****
       struct re_pattern_buffer *bufp;
  {
    unsigned char *pattern = (unsigned char *) bufp->buffer;
!   int size = bufp->used;
    register char *fastmap = bufp->fastmap;
    register unsigned char *p = pattern;
    register unsigned char *pend = pattern + size;
--- 718,724 ----
       struct re_pattern_buffer *bufp;
  {
    unsigned char *pattern = (unsigned char *) bufp->buffer;
!   ptrdiff_t size = bufp->used;
    register char *fastmap = bufp->fastmap;
    register unsigned char *p = pattern;
    register unsigned char *pend = pattern + size;
***************
*** 886,895 ****
  re_search (pbufp, string, size, startpos, range, regs)
       struct re_pattern_buffer *pbufp;
       char *string;
!      int size, startpos, range;
       struct re_registers *regs;
  {
!   return re_search_2 (pbufp, 0, 0, string, size, startpos, range, regs, size);
  }
  
  /* Like re_match_2 but tries first a match starting at index STARTPOS,
--- 898,908 ----
  re_search (pbufp, string, size, startpos, range, regs)
       struct re_pattern_buffer *pbufp;
       char *string;
!      ptrdiff_t size, startpos, range;
       struct re_registers *regs;
  {
!   return re_search_2 (pbufp, (char *)0, (ptrdiff_t)0,
! 		      string, size, startpos, range, regs, size);
  }
  
  /* Like re_match_2 but tries first a match starting at index STARTPOS,
***************
*** 908,928 ****
  re_search_2 (pbufp, string1, size1, string2, size2, startpos, range, regs, mstop)
       struct re_pattern_buffer *pbufp;
       char *string1, *string2;
!      int size1, size2;
!      int startpos;
!      register int range;
       struct re_registers *regs;
!      int mstop;
  {
    register char *fastmap = pbufp->fastmap;
    register unsigned char *translate = (unsigned char *) pbufp->translate;
!   int total = size1 + size2;
    int val;
  
    /* Update the fastmap now if not correct already */
    if (fastmap && !pbufp->fastmap_accurate)
      re_compile_fastmap (pbufp);
!   
    /* Don't waste time in a long search for a pattern
       that says it is anchored.  */
    if (pbufp->used > 0 && (enum regexpcode) pbufp->buffer[0] == begbuf
--- 921,941 ----
  re_search_2 (pbufp, string1, size1, string2, size2, startpos, range, regs, mstop)
       struct re_pattern_buffer *pbufp;
       char *string1, *string2;
!      ptrdiff_t size1, size2;
!      ptrdiff_t startpos;
!      register ptrdiff_t range;
       struct re_registers *regs;
!      ptrdiff_t mstop;
  {
    register char *fastmap = pbufp->fastmap;
    register unsigned char *translate = (unsigned char *) pbufp->translate;
!   ptrdiff_t total = size1 + size2;
    int val;
  
    /* Update the fastmap now if not correct already */
    if (fastmap && !pbufp->fastmap_accurate)
      re_compile_fastmap (pbufp);
! 
    /* Don't waste time in a long search for a pattern
       that says it is anchored.  */
    if (pbufp->used > 0 && (enum regexpcode) pbufp->buffer[0] == begbuf
***************
*** 946,954 ****
  	{
  	  if (range > 0)
  	    {
! 	      register int lim = 0;
  	      register unsigned char *p;
! 	      int irange = range;
  	      if (startpos < size1 && startpos + range >= size1)
  		lim = range - (size1 - startpos);
  
--- 959,967 ----
  	{
  	  if (range > 0)
  	    {
! 	      register ptrdiff_t lim = 0;
  	      register unsigned char *p;
! 	      ptrdiff_t irange = range;
  	      if (startpos < size1 && startpos + range >= size1)
  		lim = range - (size1 - startpos);
  
***************
*** 1008,1017 ****
  re_match (pbufp, string, size, pos, regs)
       struct re_pattern_buffer *pbufp;
       char *string;
!      int size, pos;
       struct re_registers *regs;
  {
!   return re_match_2 (pbufp, 0, 0, string, size, pos, regs, size);
  }
  #endif /* emacs */
  
--- 1021,1031 ----
  re_match (pbufp, string, size, pos, regs)
       struct re_pattern_buffer *pbufp;
       char *string;
!      ptrdiff_t size, pos;
       struct re_registers *regs;
  {
!   return re_match_2 (pbufp, (char *)0, (ptrdiff_t)0,
! 		     string, size, pos, regs, size);
  }
  #endif /* emacs */
  
***************
*** 1040,1047 ****
  re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
       struct re_pattern_buffer *pbufp;
       unsigned char *string1, *string2;
!      int size1, size2;
!      int pos;
       struct re_registers *regs;
       int mstop;
  {
--- 1054,1061 ----
  re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
       struct re_pattern_buffer *pbufp;
       unsigned char *string1, *string2;
!      ptrdiff_t size1, size2;
!      ptrdiff_t pos;
       struct re_registers *regs;
       int mstop;
  {
***************
*** 1591,1598 ****
  re_exec (s)
       char *s;
  {
!   int len = strlen (s);
!   return 0 <= re_search (&re_comp_buf, s, len, 0, len, 0);
  }
  
  #endif /* emacs */
--- 1605,1613 ----
  re_exec (s)
       char *s;
  {
!   ptrdiff_t len = strlen (s);
!   return 0 <= re_search (&re_comp_buf, s, len, (ptrdiff_t)0,
! 			 len, (ptrdiff_t)0);
  }
  
  #endif /* emacs */
***************
*** 1681,1687 ****
  
        gets (pat);	/* Now read the string to match against */
  
!       i = re_match (&buf, pat, strlen (pat), 0, 0);
        printf ("Match value %d.\n", i);
      }
  }
--- 1696,1703 ----
  
        gets (pat);	/* Now read the string to match against */
  
!       i = re_match (&buf, pat, strlen (pat), (ptrdiff_t)0,
! 		    (struct re_registers *)0);
        printf ("Match value %d.\n", i);
      }
  }
diff -c ./regex.h tc/regex.h
*** ./regex.h	Fri Mar  3 12:44:58 1989
--- tc/regex.h	Fri Jun 23 16:19:32 1989
***************
*** 1,5 ****
--- 1,6 ----
  /* Definitions for data structures callers pass the regex library.
     Copyright (C) 1985, 1989 Free Software Foundation, Inc.
+ 			MS-DOS/TurboC mods June, 1989 by Tapani Tarvainen
  
     This program is free software; you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
***************
*** 21,26 ****
--- 22,31 ----
     what you give them.   Help stamp out software-hoarding!  */
  
  
+ #ifndef __STDC__
+ typedef int ptrdiff_t;
+ #endif
+ 
  /* Define number of parens for which we record the beginnings and ends.
     This affects how much space the `struct re_registers' type takes up.  */
  #ifndef RE_NREGS
***************
*** 72,79 ****
  struct re_pattern_buffer
    {
      char *buffer;	/* Space holding the compiled pattern commands. */
!     int allocated;	/* Size of space that  buffer  points to */
!     int used;		/* Length of portion of buffer actually occupied */
      char *fastmap;	/* Pointer to fastmap, if any, or zero if none. */
  			/* re_search uses the fastmap, if there is one,
  			   to skip quickly over totally implausible characters */
--- 77,84 ----
  struct re_pattern_buffer
    {
      char *buffer;	/* Space holding the compiled pattern commands. */
!     long allocated;	/* Size of space that  buffer  points to */
!     long used;		/* Length of portion of buffer actually occupied */
      char *fastmap;	/* Pointer to fastmap, if any, or zero if none. */
  			/* re_search uses the fastmap, if there is one,
  			   to skip quickly over totally implausible characters */
***************
*** 175,180 ****
--- 180,186 ----
  extern void re_compile_fastmap ();
  extern int re_search (), re_search_2 ();
  extern int re_match (), re_match_2 ();
+ extern int re_set_syntax ();
  
  /* 4.2 bsd compatibility (yuck) */
  extern char *re_comp ();
*** /dev/null	Fri Jun 23 03:34:40 1989
--- tc/makefile.tcc	Fri Jun 23 16:19:31 1989
***************
*** 0 ****
--- 1,38 ----
+ #
+ # Makefile for GNU e?grep
+ #
+ # TurboC version by Tapani Tarvainen June 1989
+ 
+ CC = TCC
+ 
+ !if $(DEBUG)
+ CFLAGS = -DLONGHELP $(MODEL) -O -A -f- -d -k -G -Z- -w-amb -w-pia -N -y -v
+ !else
+ CFLAGS = -DLONGHELP $(MODEL) -O -A -f- -d -k- -G -Z -w-amb -w-pia
+ !endif
+ 
+ #
+ # Add wildargs.obj (supplied with Turbo C), if TC hasn't been
+ # installed to link it in automatically (or e?grep won't
+ # understand wildcards in filenames). 
+ # 
+ OBJS = dfa.obj regex.obj getopt.obj alloca.obj  
+ GOBJ = grep.obj 
+ EOBJ = egrep.obj
+ 
+ .c.obj:
+ 	$(CC) $(CFLAGS) -c $<
+ 
+ all: egrep grep
+ 
+ egrep: $(OBJS) $(EOBJ)
+ 	$(CC) $(CFLAGS) -eegrep $(OBJS) $(EOBJ) $(LIBS)
+ 
+ egrep.obj: grep.c
+ 	$(CC) $(CFLAGS) -DEGREP -c -oegrep grep.c
+ 
+ grep: $(OBJS) $(GOBJ)
+ 	$(CC) $(CFLAGS) -egrep $(OBJS) $(GOBJ) $(LIBS)
+ 
+ #dfa.obj egrep.obj grep.obj: dfa.h
+ #egrep.obj grep.obj regex.obj: regex.h
-- 
Tapani Tarvainen                 BitNet:    tarvainen@finjyu
Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi