rcpt@rw7.urc.tue.nl (Piet Tutelaers) (04/19/91)
I have spend some time to see if `ispell' can be enhanced with support for more languages than only English (and american English). Ispell as it is now available has several drawbacks for using it in a multi-language environment: * the reduction rules are based upon English and bad for Dutch, French etc (see further on) * the builtin `detex' (if the filename ends with .tex) does not recognize LaTeX constructs like \begin{quote} * There is no support for diacritical symbols like "i. The Dutch words: ge\"{\i}nfecteerd ge\"\i nfecteerd ge"infecteerd (supported by the new dutch.sty) should all be accepted as valid derivations of the verb `infecteren'. Now it will be split up into `ge' and `nfecteerd'. In the future (X11.5 ?) there will hopefully become better support for European languages. * Ispell can not skip over text. If your document contains more than one language it makes no sense to check french words with an english dictionary. It would be great if ispell offered a facility to skip forward and backward in the text being checked. (Try to check this text with ispell!) A PD Dutch word list containing 150000 words (1.7 Mbytes) processed by `buildhash' without word reduction gives a 3.5 Mbytes long dictionary, reduced by `munchlist' it is still 2.6 Mbytes. Because this hash table will be loaded in memory the program behaves like a snail. To improve the situation for Dutch we need a reduced word list. The best I can think of is to automatically reduce a word list. It should be possible to write a program that reduces words like: gespeeld jou'e gespeild played speelt joue spielt plays speelde jouait spielte playing spelen jouer spielen play jouerai jouons etc. into their basic stems (ex. spelen, jouer, spielen, play) together with a function `reduce' (that given the word `gespeeld' responds with `spelen') and a function `expand' (given the word `spelen' returns the words `gespeeld', `speelt', `speelde' etc.). Does anybody have pointers to an existed method or literature on this subject? Perhaps a nice problem for a class on software engeneering? --Piet internet: rcpt@urc.tue.nl | Piet Tutelaers Room RC 1.90 bitnet: rcpt@heithe5.BITNET | Eindhoven University of Technology phone: +31 (0)40 474541 | P.O. Box 513, 5600 MB Eindhoven, NL