[comp.sys.ibm.pc] Hyphenation code wanted

ray@ole.UUCP (Ray Berry) (09/16/89)

    I am looking for c src code for rule-driven hyphenation of english
words.  Does anyone have something they could e-mail?  Donations, pointers-
all are encouraged/appreciated.  Thank you.
-- 
Ray Berry  kb7ht  uucp: ...ole!ray CIS: 73407,3152 /* "inquire within" */
Seattle Silicon Corp. 3075 112th Ave NE. Bellevue WA 98004 (206) 828-4422

matthew@sunpix.UUCP ( Sun Visualization Products) (09/27/89)

In article <1333@ole.UUCP> ray@ole.UUCP (Ray Berry) writes:
|
|    I am looking for c src code for rule-driven hyphenation of english
|words.  Does anyone have something they could e-mail?  Donations, pointers-
|all are encouraged/appreciated.  Thank you.
|-- 
|Ray Berry  kb7ht  uucp: ...ole!ray CIS: 73407,3152 /* "inquire within" */
|Seattle Silicon Corp. 3075 112th Ave NE. Bellevue WA 98004 (206) 828-4422


I've seen a book called 'C Chests and other Treasures', which are a reprint
of DDJ articles from the C Chests column.

One of the chapters in the book is dedicated to a rule-driven hyphenation
program


-- 
Matthew Lee Stier                            |
Sun Microsystems ---  RTP, NC  27709-3447    |     "Wisconsin   Escapee"
uucp:  sun!mstier or mcnc!rti!sunpix!matthew |
phone: (919) 469-8300 fax: (919) 460-8355    |

raymond@hilbert.berkeley.edu (Raymond Chen) (09/28/89)

In article <1333@ole.UUCP> ray@ole.UUCP (Ray Berry) writes:
|    I am looking for c src code for rule-driven hyphenation of english
|words.  Does anyone have something they could e-mail?  Donations, pointers-
|all are encouraged/appreciated.  Thank you.
|-- 
|Ray Berry  kb7ht  uucp: ...ole!ray CIS: 73407,3152 /* "inquire within" */
|Seattle Silicon Corp. 3075 112th Ave NE. Bellevue WA 98004 (206) 828-4422

If you're after perfection, look at appendix H of Knuth's TeXbook.  It
describes the hyphenation algorithm used by the TeX program (which is
in turn based on a Stanford Ph.D. thesis).  The algorithm itself is
really simple.  It misses only 14 of the commonly-used words in the
English language (4 of them being "present" "presents" "project" and
"projects", which can be hyphenated in two different ways, depending on
the context).  The TeX Users' Group (TUG) has a list of all known words
which the algorithm fails to hyphenate correctly.  (Trust me, the words
on the list are words you'd never use.  How often do you have to
hyphenate "Grothendieck"?)  In most cases, the only error in the
algorithm is that it misses hyphenation points.  It rarely places a
hyphen where there shouldn't be one.

Disclaimer:  This is from memory.  I hope you get the idea of what
	I'm saying (i.e., read Appendix H, and get a copy of the hyphen.tex
	file from somebody).  Any errors in this article are unintentional
	and were made in good faith.

lee@uhccux.uhcc.hawaii.edu (Greg Lee) (09/28/89)

From article <1989Sep27.235236.22920@agate.berkeley.edu>, by raymond@hilbert.berkeley.edu (Raymond Chen):
" ...
" If you're after perfection, look at appendix H of Knuth's TeXbook.
" ...  In most cases, the only error in the
" algorithm is that it misses hyphenation points.  It rarely places a
" hyphen where there shouldn't be one.

I'll give an algorithm that is even more perfect than this: Don't
hyphenate.  Then you _never_ place a hyphen where there shouldn't
be one.

To put the matter more straightforwardly, TeX's hyphenation
misses perfection by a considerable margin, since it misses
many good hyphenation points.

			Greg, lee@uhccux.uhcc.hawaii.edu