mackay@iisat.UUCP (Daniel MacKay) (08/05/88)
My current programming project is a program called "travesty"; it was in a Byte many moons ago. It (in a nutshell) produces text with the same 3-or-4 letter sequence distributions, as a piece of sample text you feed it. I have tried a couple of implementations, both different from Byte's, but in the end I think I'll resort to their solution (in forth, tho, not pathcal). Their solution involved loading the entire text into memory, then scanning the entire buffer every time an output character has to be generated; what you do is look at the last three (or whatever) letters you output, then look for the probability of one more letter following it in the original text. After you've collected the probabilities for all the letters that follow it (i.e. count the # of times each one followed), then you roll a weighted die and pick a letter. Then you go on to the next letter. The program generates hilarious and interesting text from various sources. Now here's my problem: I've been out of forth for a couple of years, and I need a word that can take a buffer address and a string address, I guess, and return the address of the letter following first occurrance of the string in the buffer! Now this has to be a really efficient word, (cause it gets called on a 30-K or so buffer for every letter output) and it seems like the kind of problem that has a very, very simple solution- I'm looking for insight- I can use a big hammer but I know there's an elegant solution (that's what FORTH's all about, right?). Whatcha say, guys? any suggestions? Anyone done this already? ---- +---------+ IIS Public Usenet | _ | From the Halifax, Nova Scotia | (_)===| Disk of ... Canada | | Daniel mackay@iisat.UUCP +---------+ ...{utai,uunet,watmath}!dalcs!iisat!mackay
jax@well.UUCP (Jack J. Woehr) (08/08/88)
In article <99@iisat.UUCP> mackay@iisat.UUCP (Daniel MacKay) writes: > > >Now here's my problem: I've been out of forth for a couple of years, >and I need a word that can take a buffer address and a string address, >I guess, and return the address of the letter following first occurrance of >the string in the buffer! Try this: <xxxx> constant buffersize create foobuffer buffersize allot : find-first-string-match ( addr-of-string count-of-string ---- addr|false) buffersize over - 0 swap 0 \ install dummy fail flag do drop \ dummy fail flag 2dup foobuff i + swap compare \ 0 means perfect match 0= if foobuff i + leave then \ exit with addr, not fail 0 \ restore fail flag loop -rot 2drop ; \ lose srcaddr & ct : find-first-char-after-match ( $addr ct --- addr|false) dup >r \ save count find-first-string-match \ find where the string matches, if it does r> over \ bring back count if + else drop then ; \ if non-zero, addr was found, add count This is not entirely optimal, but if you examine these, and the source in your Forth system for the words called, you will probably be able to code something *real* fast. Good luck! Send us a copy ... :-) ************ jax@well jax@chariot " The only thing better than Forth is beer" JAX on GEnie
A-PIRARD@BLIULG11.BITNET (Andre PIRARD) (08/10/88)
>and I need a word that can take a buffer address and a string address, >I guess, and return the address of the letter following first occurrance of >the string in the buffer! Well, your problem is one of finding the occurence of a string in another one. At the Southern Belgium Fig Chapter, we have been debating for a long time how Comforth should perform this function in terms of stack behaviour. We finally settled for the definition and code below. You will easily be able to turn it to anything you wish. We found the definition of -MATCH suggested by the standard often involving intricate computation on the parameters returned. We like thinking of strings in terms of address and count. MATCH returns what we think is most useful for immediate use: the substring preceding the match and the one following it. The following example shows how easily it is used to type on different lines the parts of a character string separated by commas: : SPLIT \ scon -- BEGIN DUP WHILE " ," MATCH TYPE CR 1 /STRING REPEAT 2DROP ; MATCH scon1 scon2 -- scon3 scon1' Search the first occurrence in scon1 of the characters of scon2. Split the string at the match point, giving scon1' and scon3, such that scon1=scon1'+scon3 and either, when a match exists, scon3=scon2+rest or, when no match or scon2 is the null string, scon3=null. When scon2 is not null, scon3 size may be used as a flag. MATCH is most useful at removing separators from a string and repeatedly handling the first part then continuing with the rest until a null string is encountered. /STRING may be used to behead scon3 by scon2 size to remove the matching part off that rest. scon2 size is limited to 255 characters on some CPU's. Note: our descriptions use the symbol <sconx> to refer to a pair of stack entries containing the address and count of a character string whose contents is not changed by the definition. CODE MATCH \ a1 l1 a2 l2 -- a1' l1' a1 l3 \ find scon2 at a1' in scon1 LABEL NOMATCH DI , CX ADD CX , CX XOR \ l1'=0 when no match LABEL OKMATCH DX , DI MOV SI POP DI POP AX POP DX PUSH CX PUSH \ a1',l1', matching substring AX PUSH DX , AX SUB \ a1,l3=a1'-a1, skipped one NEXT ENTRY: BX POP CX POP AX POP AX PUSH DI PUSH SI PUSH DI , AX MOV AL , 0 [BX] MOV \ DI=a1 CX=l1 BX=a2 DX=l2 AL=first char of 2 DX , DX OR NOMATCH JE \ null string 2 special case DX DEC BX INC BEGIN REPNZ SCASB NOMATCH JNE \ search next first char equality => a1',l1' CX , DX CMP NOMATCH JB \ not enough to compare rest, l1'<l2 DI PUSH CX PUSH \ save where to continue SI , BX MOV CX , DX MOV CX , CX OR REPZ CMPSB \ compare rest (assume equality if null) CX POP DI POP E UNTIL DI DEC CX INC OKMATCH JMP END-CODE \ exit if equal Note: our Comforth 8086 environments keep the top of stack in register DX. This highly enhances performance as in CODE 2* DX , DX ADD. Converting to Forths not using this feature is just a matter of adding DX POP ahead and DX PUSH behind. Andr .