[comp.sys.ibm.pc.programmer] FAST pattern matching routines in C

cs106132@umbc5.umbc.edu (cs106132) (05/17/91)

  Hi,
here is a less than optimized implementation of a VERY fast pattern
matching algorithm called Knuth-Morris-Pratt.  It is a must in
almost everyone's library.  Forget the good old "slide over one"
method to search for patterns.  KMP is a screamer with linear time
complexity.  The book I got this out of proves that, but theory is
not my favorite subject.  Only if it was in 80x86 assembler!

/****************************************************************/
/* this function will build a next[] array for the pattern */
int  build_next (signed char next[], unsigned char pat[], unsigned m) {

   int  found;
   int  j = 0;
   int  k = -1;

   next[0] = -1;
   while (j < m) {
      found = 0;
      while (k >= 0 && (! found)) {
	 if (pat[k] != pat[j])
	    k = next[k];
	 else
	    found = 1;
      }
      k++;
      j++;
      if (pat[j] == pat[k])
	 next[j] = next[k];
      else
	 next[j] = k;
   }
   return (0);
}
/*********************************************************************
  This function will looks for pat[] int text[] using next[]
  It assumes that pat[] is less than 128 bytes long
********************************************************************/
int  KMP (unsigned char text[], unsigned n, unsigned char pat[],
	  unsigned m, signed char next[]) {

   int  i = 0;
   int  j = 0;
   int  pos = 0; 

   while (i < m && j < n) {
      if (pat[i] == text[j]) {
	 i++;
	 j++;
      }
      else {
	 pos += (i - next[i]); 
	 if (next[i] >= 0))
	    i = next[i];
	 else {
	    i = 0;
	    j++;
	 }
      }
   }
   if (i == m) {
      cprintf ("%d\r\n", pos); 
      return (0);  /* pos is where found start */
   }
   else
      return (-1);
}

/*********************************************************************/
   I need an 80x86 assembler version of the routines above, especially
KMP.  If any assembly gurus out there take the challenge and post a 
C-callable version of this, it would be much appreciated.

Regards,
Tarkan