dg@lakart.UUCP (David Goodenough) (08/15/89)
I am looking for code that will decide where English words should be hyphenated. I'm not too worried exactly what form of output it produces, since the application I'm writing can be modified to suit. For example, my first run simply looked for vccv combinations, and split between the two consonants. What I did was to create a subroutine: hyphenate(word, position) char *word; char *position; where word was the word to be hyphenated, and position was an array where each entry corresponded to a letter in word, and a given entry was set TRUE if the corresponding letter in the word could have a hyphen placed _AFTER_ it - so hyphenate("hello", x) would set x[2] (the first 'l') and clear all other elements of x. This is my current interface, if the code provided does something different then I'll figure out how to adapt it. C source is my first choice, I'll take pascal if pushed. Thanks in advance, -- dg@lakart.UUCP - David Goodenough +---+ IHS | +-+-+ ....... !harvard!xait!lakart!dg +-+-+ | AKA: dg%lakart.uucp@xait.xerox.com +---+
cck@deneb.ucdavis.edu (Earl H. Kinmonth) (08/18/89)
In article <654@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes: >I am looking for code that will decide where English words should be >hyphenated. I'm not too worried exactly what form of output it produces, >since the application I'm writing can be modified to suit. C CHEST AND OTHER C TREASURES FROM Dr. DOBB'S JOURNAL (M & T Books, 1987) contains such code (in C). You can get a disk for $25.00 with this and other items on it. You might also check the items available from the C-Users Group.
dg@lakart.UUCP (David Goodenough) (08/21/89)
In article <654@lakart.UUCP> dg@lakart.UUCP (Who, Me?) writes: >I am looking for code that will decide where English words should be >hyphenated. I'm not too worried exactly what form of output it produces, >since the application I'm writing can be modified to suit. I have been told that TeX contains a really sharp hyphenation algorithm 1. Is the TeX source in the public domain? 2. If so, can someone E-mail me the relevant bits - I'll take C or Pascal, or anything else, it's all going to get translated to something else pretty ugly anyway ( Z80 assembler - now aren't you sorry you asked :-) ) Thanks in advance (again) -- dg@lakart.UUCP - David Goodenough +---+ IHS | +-+-+ ....... !harvard!xait!lakart!dg +-+-+ | AKA: dg%lakart.uucp@xait.xerox.com +---+
ray@ole.UUCP (Ray Berry) (09/16/89)
I am looking for c src code for rule-driven hyphenation of english words. Does anyone have something they could e-mail? Donations, pointers- all are encouraged/appreciated. Thank you. -- Ray Berry kb7ht uucp: ...ole!ray CIS: 73407,3152 /* "inquire within" */ Seattle Silicon Corp. 3075 112th Ave NE. Bellevue WA 98004 (206) 828-4422
matthew@sunpix.UUCP ( Sun Visualization Products) (09/27/89)
In article <1333@ole.UUCP> ray@ole.UUCP (Ray Berry) writes: | | I am looking for c src code for rule-driven hyphenation of english |words. Does anyone have something they could e-mail? Donations, pointers- |all are encouraged/appreciated. Thank you. |-- |Ray Berry kb7ht uucp: ...ole!ray CIS: 73407,3152 /* "inquire within" */ |Seattle Silicon Corp. 3075 112th Ave NE. Bellevue WA 98004 (206) 828-4422 I've seen a book called 'C Chests and other Treasures', which are a reprint of DDJ articles from the C Chests column. One of the chapters in the book is dedicated to a rule-driven hyphenation program -- Matthew Lee Stier | Sun Microsystems --- RTP, NC 27709-3447 | "Wisconsin Escapee" uucp: sun!mstier or mcnc!rti!sunpix!matthew | phone: (919) 469-8300 fax: (919) 460-8355 |
raymond@hilbert.berkeley.edu (Raymond Chen) (09/28/89)
In article <1333@ole.UUCP> ray@ole.UUCP (Ray Berry) writes: | I am looking for c src code for rule-driven hyphenation of english |words. Does anyone have something they could e-mail? Donations, pointers- |all are encouraged/appreciated. Thank you. |-- |Ray Berry kb7ht uucp: ...ole!ray CIS: 73407,3152 /* "inquire within" */ |Seattle Silicon Corp. 3075 112th Ave NE. Bellevue WA 98004 (206) 828-4422 If you're after perfection, look at appendix H of Knuth's TeXbook. It describes the hyphenation algorithm used by the TeX program (which is in turn based on a Stanford Ph.D. thesis). The algorithm itself is really simple. It misses only 14 of the commonly-used words in the English language (4 of them being "present" "presents" "project" and "projects", which can be hyphenated in two different ways, depending on the context). The TeX Users' Group (TUG) has a list of all known words which the algorithm fails to hyphenate correctly. (Trust me, the words on the list are words you'd never use. How often do you have to hyphenate "Grothendieck"?) In most cases, the only error in the algorithm is that it misses hyphenation points. It rarely places a hyphen where there shouldn't be one. Disclaimer: This is from memory. I hope you get the idea of what I'm saying (i.e., read Appendix H, and get a copy of the hyphen.tex file from somebody). Any errors in this article are unintentional and were made in good faith.
lee@uhccux.uhcc.hawaii.edu (Greg Lee) (09/28/89)
From article <1989Sep27.235236.22920@agate.berkeley.edu>, by raymond@hilbert.berkeley.edu (Raymond Chen): " ... " If you're after perfection, look at appendix H of Knuth's TeXbook. " ... In most cases, the only error in the " algorithm is that it misses hyphenation points. It rarely places a " hyphen where there shouldn't be one. I'll give an algorithm that is even more perfect than this: Don't hyphenate. Then you _never_ place a hyphen where there shouldn't be one. To put the matter more straightforwardly, TeX's hyphenation misses perfection by a considerable margin, since it misses many good hyphenation points. Greg, lee@uhccux.uhcc.hawaii.edu