levy@ttrdc.UUCP (Daniel R. Levy) (12/07/85)
In article <13666@rochester.UUCP>, ken@rochester.UUCP (Ipse dixit) writes: >In article <7839@ucla-cs.ARPA> jimc@ucla-cs.UUCP (Jim Carter) writes: >>(Bug: if the pattern is ROUTE and the text is ROUROUTE, the pattern will >>be missed. This is an easy fix. But if the pattern is TINTINABULATION >>and the text is TINTINTINABULATION, even with the above fix the pattern >>will be missed. It could be seen only with long and expensive pattern >>checking -- or does someone see a smart way to do it? > >Yes, in the chapter on string matching in Design and Analysis of Computer >Algorithms (Aho, Hopcroft and Ullman?) you can see how to construct a >finite state machine to recognize a substring anywhere in the input ^^^^^ >word. Basically, when a failure to match occurs, the machine goes back ^^^^ >to the last state which it could be in. In your second example, when >it fails to match the third T it goes back to the state it would have >been in after seeing TINT. All the states can be computed in advance and the >algorithm runs in linear time. > Ken Input WORD? The input to this example was a stream of characters, NOT a series of "words", if I can remember correctly. Does this algorithm deal with STREAMS, too? (I.e., inability to back up, no internal buf- fering.) Perhaps a bit dirtily, a buffer (big enough to hold the longest possible pattern which is to be matched) might be used, filling it and flushing it, watching for the first character of the pattern to be matched and doing a more complete check as necessary. This would be a worst case linear time algorithm, too. Example: Pattern to be matched: ABABC Stream: BABABABCABDABEBCDEFGHIJKLMNABABABABCDEFG ----- ----- Buffer: BABAB; Stream: ABCABDABEBCDEFGHIJKLMNABABABABCDEFG Flush out: B come to A; shift and fill; Buffer: ABABA; Stream: BCABDABEBCDEFGHIJKLMNABABABABCDEFG ABABA != ABABC Flush out: A,B come to A; shift and fill; Buffer: ABABC; Stream: ABDABEBCDEFGHIJKLMNABABABABCDEFG ABABC == ABABC Output pattern with newline: '\n'ABABC Fill buffer Buffer: ABDAB; Stream: EBCDEFGHIJKLMNABABABABCDEFG Buffer begins with A; ABDAB != ABABC Flush out: A,B,D come to A; shift and fill; Buffer: ABEBC; Stream: DEFGHIJKLMNABABABABCDEFG ABEBC != ABABC Flush out: A,B,E,B,C Fill buffer Buffer: DEFGH; Stream: IJKLMNABABABABCDEFG Flush out: D,E,F,G,H Fill buffer Buffer: IJKLM; Stream: NABABABABCDEFG Flush out: I,J,K,L,M Fill buffer Buffer: NABAB; Stream: ABABCDEFG Flush out: N Come to A; shift and fill; Buffer: ABABA; Stream: BABCDEFG ABABA != ABABC Flush out: A,B Come to A; shift and fill; Buffer: ABABA; Stream: BCDEFG ABABA != ABABC Flush out: A,B Come to A; shift and fill; Buffer: ABABC; Stream: DEFG ABABC == ABABC Output pattern with newline: '\n'ABABC Fill buffer Buffer: DEFG_ [EOF flag set]; Stream: empty Flush out: D,E,F,G (For good measure: output '\n') -- ------------------------------- Disclaimer: The views contained herein are | dan levy | yvel nad | my own and are not at all those of my em- | an engihacker @ | ployer or the administrator of any computer | at&t computer systems division | upon which I may hack. | skokie, illinois | -------------------------------- Path: ..!ihnp4!ttrdc!levy
ken@rochester.UUCP (Ipse dixit) (12/09/85)
In article <624@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes: >Input WORD? The input to this example was a stream of characters, NOT >a series of "words", if I can remember correctly. Does this algorithm >deal with STREAMS, too? (I.e., inability to back up, no internal buf- >fering.) Sorry for the misunderstanding but I was using word in the theoretical sense, i.e. a string of symbols. No backtrack is required, it is totally deterministic. Ken -- UUCP: ..!{allegra,decvax,seismo}!rochester!ken ARPA: ken@rochester.arpa USnail: Dept. of Comp. Sci., U. of Rochester, NY 14627. Voice: Ken!
steve@hcradm.UUCP (Steve Pozgaj) (12/10/85)
Hey, c'mon people! Something very obvious has been missed here. Namely, the damn program should have been written in SNOBOL in the first place. You want to write operating systems, you use C; you want to match patterns, you use SNOBOL:-) -- Everyone's a SNOBOL freak at heart! It's only the brave who admit it.
abh6509@ritcv.UUCP (Xandrew) (12/13/85)
In article <2433@hcradm.UUCP> steve@hcradm.UUCP (Steve Pozgaj) writes: >Hey, c'mon people! Something very obvious has been missed here. Namely, >the damn program should have been written in SNOBOL in the first place. >You want to write operating systems, you use C; you want to match patterns, >you use SNOBOL:-) >-- >Everyone's a SNOBOL freak at heart! It's only the brave who admit it. How right you are! Now show us how to link a SNOBOL file into a C module......
chai@utflis.UUCP (H. Chai) (12/13/85)
In article <2433@hcradm.UUCP> steve@hcradm.UUCP (Steve Pozgaj) writes: >You want to write operating systems, you use C; you want to match patterns, >you use SNOBOL:-) >-- >Everyone's a SNOBOL freak at heart! It's only the brave who admit it. Please, Steve, SNOBOL is *out*. If you want to do pattern matching, the *in* language is Icon! 8-) (and it's not yuppie!) -- Henry Chai, just a humble student at the Faculty of Library and Information Science, U of Toronto {watmath,ihnp4,allegra}!utzoo!utflis!chai
peterch@tekig4.UUCP (Peter Chao) (12/18/85)
In article <634@utflis.UUCP> chai@utflis.UUCP (H. Chai) writes: >In article <2433@hcradm.UUCP> steve@hcradm.UUCP (Steve Pozgaj) writes: >>You want to write operating systems, you use C; you want to match patterns, >>you use SNOBOL:-) >>-- >>Everyone's a SNOBOL freak at heart! It's only the brave who admit it. > >Please, Steve, SNOBOL is *out*. If you want to do pattern matching, >the *in* language is Icon! 8-) >(and it's not yuppie!) > >-- >Henry Chai, just a humble student at the >Faculty of Library and Information Science, U of Toronto >{watmath,ihnp4,allegra}!utzoo!utflis!chai ------------------- I second Henry C.. Icon, developed by the same author(s?) that developed Snobol, is far more general and 'powerful'. Even though I have a biased preference for the beauty of simplicity of Snobol, I must agree that Icon serves better practical purposes. Peter S. Chao
ark@alice.UucP (Andrew Koenig) (12/20/85)
> Even though I have a biased preference for the beauty of simplicity of > Snobol, I must agree that Icon serves better practical purposes. Yes and no. Thought experiment: imagine that the syntax of SNOBOL4 were modernized. What would then be your complaints about the language?
agw@garfield.columbia.edu (Art Werschulz) (12/24/85)
This may not be entirely relevant to this particular discusssion, but ... Does anybody know where I can find a SNOBOL compiler for a VAX running under VMS Version 4.0 (or later)? Thanks. Art Werschulz