[net.lang] Searches for string in a stream

levy@ttrdc.UUCP (Daniel R. Levy) (12/07/85)

In article <13666@rochester.UUCP>, ken@rochester.UUCP (Ipse dixit) writes:
>In article <7839@ucla-cs.ARPA> jimc@ucla-cs.UUCP (Jim Carter) writes:
>>(Bug: if the pattern is ROUTE and the text is ROUROUTE, the pattern will
>>be missed.  This is an easy fix.  But if the pattern is TINTINABULATION
>>and the text is TINTINTINABULATION, even with the above fix the pattern
>>will be missed.  It could be seen only with long and expensive pattern
>>checking -- or does someone see a smart way to do it?
>
>Yes, in the chapter on string matching in Design and Analysis of Computer
>Algorithms (Aho, Hopcroft and Ullman?) you can see how to construct a
>finite state machine to recognize a substring anywhere in the input
                                                               ^^^^^
>word. Basically, when a failure to match occurs, the machine goes back
 ^^^^
>to the last state which it could be in. In your second example, when
>it fails to match the third T it goes back to the state it would have
>been in after seeing TINT. All the states can be computed in advance and the
>algorithm runs in linear time.
>	Ken

Input WORD?  The input to this example was a stream of characters, NOT
a series of "words", if I can remember correctly.  Does this algorithm
deal with STREAMS, too?  (I.e., inability to back up, no internal buf-
fering.)

Perhaps a bit dirtily, a buffer (big enough to hold the longest
possible pattern which is to be matched) might be used, filling it and
flushing it, watching for the first character of the pattern to be matched
and doing a more complete check as necessary.
This would be a worst case linear time algorithm, too.  Example:

Pattern to be matched: ABABC

Stream: BABABABCABDABEBCDEFGHIJKLMNABABABABCDEFG
           -----                       -----
Buffer: BABAB; Stream: ABCABDABEBCDEFGHIJKLMNABABABABCDEFG
Flush out: B
come to A; shift and fill;
Buffer: ABABA; Stream: BCABDABEBCDEFGHIJKLMNABABABABCDEFG
ABABA != ABABC
Flush out: A,B
come to A; shift and fill;
Buffer: ABABC; Stream: ABDABEBCDEFGHIJKLMNABABABABCDEFG
ABABC == ABABC
Output pattern with newline: '\n'ABABC
Fill buffer
Buffer: ABDAB; Stream: EBCDEFGHIJKLMNABABABABCDEFG
Buffer begins with A;
ABDAB != ABABC
Flush out: A,B,D
come to A; shift and fill;
Buffer: ABEBC; Stream: DEFGHIJKLMNABABABABCDEFG
ABEBC != ABABC
Flush out: A,B,E,B,C
Fill buffer
Buffer: DEFGH; Stream: IJKLMNABABABABCDEFG
Flush out: D,E,F,G,H
Fill buffer
Buffer: IJKLM; Stream: NABABABABCDEFG
Flush out: I,J,K,L,M
Fill buffer
Buffer: NABAB; Stream: ABABCDEFG
Flush out: N
Come to A; shift and fill;
Buffer: ABABA; Stream: BABCDEFG
ABABA != ABABC
Flush out: A,B
Come to A; shift and fill;
Buffer: ABABA; Stream: BCDEFG
ABABA != ABABC
Flush out: A,B
Come to A; shift and fill;
Buffer: ABABC; Stream: DEFG
ABABC == ABABC
Output pattern with newline: '\n'ABABC
Fill buffer
Buffer: DEFG_ [EOF flag set]; Stream: empty
Flush out: D,E,F,G
(For good measure: output '\n')
-- 
 -------------------------------    Disclaimer:  The views contained herein are
|       dan levy | yvel nad      |  my own and are not at all those of my em-
|         an engihacker @        |  ployer or the administrator of any computer
| at&t computer systems division |  upon which I may hack.
|        skokie, illinois        |
 --------------------------------   Path: ..!ihnp4!ttrdc!levy

ken@rochester.UUCP (Ipse dixit) (12/09/85)

In article <624@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes:
>Input WORD?  The input to this example was a stream of characters, NOT
>a series of "words", if I can remember correctly.  Does this algorithm
>deal with STREAMS, too?  (I.e., inability to back up, no internal buf-
>fering.)

Sorry for the misunderstanding but I was using word in the theoretical
sense, i.e. a string of symbols. No backtrack is required, it is
totally deterministic.

	Ken
-- 
UUCP: ..!{allegra,decvax,seismo}!rochester!ken ARPA: ken@rochester.arpa
USnail:	Dept. of Comp. Sci., U. of Rochester, NY 14627. Voice: Ken!

steve@hcradm.UUCP (Steve Pozgaj) (12/10/85)

Hey, c'mon people!  Something very obvious has been missed here.  Namely,
the damn program should have been written in SNOBOL in the first place.
You want to write operating systems, you use C; you want to match patterns,
you use SNOBOL:-)
--
Everyone's a SNOBOL freak at heart!  It's only the brave who admit it.

abh6509@ritcv.UUCP (Xandrew) (12/13/85)

In article <2433@hcradm.UUCP> steve@hcradm.UUCP (Steve Pozgaj) writes:
>Hey, c'mon people!  Something very obvious has been missed here.  Namely,
>the damn program should have been written in SNOBOL in the first place.
>You want to write operating systems, you use C; you want to match patterns,
>you use SNOBOL:-)
>--
>Everyone's a SNOBOL freak at heart!  It's only the brave who admit it.

How right you are! Now show us how to link a SNOBOL file into a C module......

chai@utflis.UUCP (H. Chai) (12/13/85)

In article <2433@hcradm.UUCP> steve@hcradm.UUCP (Steve Pozgaj) writes:
>You want to write operating systems, you use C; you want to match patterns,
>you use SNOBOL:-)
>--
>Everyone's a SNOBOL freak at heart!  It's only the brave who admit it.

Please, Steve, SNOBOL is *out*.  If you want to do pattern matching,
the *in* language is Icon! 8-)
(and it's not yuppie!)

-- 
Henry Chai, just a humble student at the 
Faculty of Library and Information Science, U of Toronto
{watmath,ihnp4,allegra}!utzoo!utflis!chai        

peterch@tekig4.UUCP (Peter Chao) (12/18/85)

In article <634@utflis.UUCP> chai@utflis.UUCP (H. Chai) writes:
>In article <2433@hcradm.UUCP> steve@hcradm.UUCP (Steve Pozgaj) writes:
>>You want to write operating systems, you use C; you want to match patterns,
>>you use SNOBOL:-)
>>--
>>Everyone's a SNOBOL freak at heart!  It's only the brave who admit it.
>
>Please, Steve, SNOBOL is *out*.  If you want to do pattern matching,
>the *in* language is Icon! 8-)
>(and it's not yuppie!)
>
>-- 
>Henry Chai, just a humble student at the 
>Faculty of Library and Information Science, U of Toronto
>{watmath,ihnp4,allegra}!utzoo!utflis!chai        
-------------------

I second Henry C..

Icon, developed by the same author(s?) that developed Snobol, is far more
general and 'powerful'.

Even though I have a biased preference for the beauty of simplicity of
Snobol, I must agree that Icon serves better practical purposes.

   Peter S. Chao

ark@alice.UucP (Andrew Koenig) (12/20/85)

> Even though I have a biased preference for the beauty of simplicity of
> Snobol, I must agree that Icon serves better practical purposes.

Yes and no.  Thought experiment:  imagine that the syntax of SNOBOL4
were modernized.  What would then be your complaints about the language?

agw@garfield.columbia.edu (Art Werschulz) (12/24/85)

This may not be entirely relevant to this particular discusssion, but ...
Does anybody know where I can find a SNOBOL compiler for a VAX running under
VMS Version 4.0 (or later)?

Thanks.

	Art Werschulz