kay@flame.UUCP (Kay Dekker) (12/02/84)
Lots & lots of people have written to me asking for copies of subs. Several have suggested posting it to net.sources. Though I like getting mail from you all, it would probably be nicer for the people we send mail through if I did post it. I will still send copies by mail for people who mailed me for it, because they may not get the sources newsgroup, but if you do, please get your copy from there. You could still mail me & say 'Hi!' anyway :-) {Enjoy, }+ Kay. -- "But you TOLD me to type 'rm * .o', and I DID, and it said 'rm: .o nonexistent', and ... ... mcvax!ukc!flame!ubu!kay
steven@mcvax.UUCP (Steven Pemberton) (12/03/84)
It was interesting to see this program, not least because I've had a version running here for a couple of years written in the new language B (no relation to C's predecessor). The nice thing about the B version is that it is only 24 lines long, compared with the 198 lines of C. Consequently, I thought it might be worth posting it here as it is so short. An explanation of the relevant parts of B follows the program. Comments begin with the character "\" HOW'TO RECONSTRUCT document USING n GRAMS: PUT " "^^(n-1), {} IN gram, followers FOR line IN document: ANALYSE \ Analyse each line in turn FOR count IN {1..10}: GENERATE \ 10 lines of imitation ANALYSE: \ Analyse one line FOR char IN line: UPDATE followers FOR gram WITH char APPEND char TO gram UPDATE followers FOR gram WITH " " \ Treat line end as space APPEND " " TO gram GENERATE: \ Generate one line PUT 0 IN length CHOOSE gram FROM keys followers \ Choose a random start WHILE gram in keys followers AND (length<50 OR char<>" "): CHOOSE char FROM followers[gram] WRITE char PUT length+1 IN length APPEND char TO gram WRITE / \ Write a newline HOW'TO UPDATE followers FOR gram WITH char: IF gram not'in keys followers: PUT {} IN followers[gram] INSERT char IN followers[gram] HOW'TO APPEND char TO gram: PUT (gram^char)@2 IN gram Sample output, using the poem "Mary had a little lamb" as input, and using i-grams for i=1..4: 1: nhwaso ldm r ttvncd vetsvaen ryam wnMhuaaiaa lryulitlab 2: evere Mad ad ts go Mamb snts s s ad lamb flits it 3: ry hat Marywhery was was fleece that lamb ittleece 4: te as snow and everywhere that lamb was white as white Commentary. B is a very simple language, about as easy to learn as BASIC, but the big difference is that B has very powerful data-types, and it's thanks to these that programming in B is so easy. It has two basic types, numbers and texts, and three constructed types, lists, tables, and compounds. Texts are strings of characters. For instance "hello" is a text. There is no character data-type: you just use a text of length one, such as "a" or "b". There are operators on texts: a^b joins the two texts a and b, a^^n repeats the text a n times, and a@n gives the tail of text a starting at the nth character. Lists are sorted lists of elements. E.g., {"B"; "Pascal"; "Smalltalk"} is a list of texts, as is {"a"; "b"; "c"}, and {"z"}. {} is the empty list. You can insert a new element e in a list l with INSERT e IN l. Although lists are kept sorted (alphabetically in the case of texts) the program doesn't use this fact. The program deals with n-grams, that is, groups of n letters from a text. The B program works by associating with each group of n-1 letters, a list of letters that may follow it. Thus when dealing with trigrams for the sentence "Nonsense imitation can be disconcerting", for the letters "on" you get the list {" "; "c"; "s"}, and for "ns" you get {"e"; "e"}. This association is done using the table data-type, which is a generalisation of arrays: in most other languages, you may only index arrays with integers (or similar), while in B you can use any type. In this program a table called 'followers' is used, indexed by texts and giving lists of characters, so that with the above sentence you get followers["ns"] = {"e"; "e"} for instance. You can find out which indexes have been used for a table: 'keys followers' gives the list of such indexes. {} is also the empty table. Another interesting point is that little of the program needs to be altered for it to work with words instead of characters. UPDATE for instance remains identical. B is an interactive language, and there are implementations of B for machines with Unix, and shortly for the IBM PC without Unix. Anyone who would like to know more about B or its implementations should mail me. Steven Pemberton, CWI, Amsterdam; steven@mcvax.