[comp.sources.wanted] Hyphenation code wanted

dg@lakart.UUCP (David Goodenough) (08/15/89)

I am looking for code that will decide where English words should be
hyphenated. I'm not too worried exactly what form of output it produces,
since the application I'm writing can be modified to suit.

For example, my first run simply looked for vccv combinations, and split
between the two consonants. What I did was to create a subroutine:

	hyphenate(word, position)
	char *word;
	char *position;

where word was the word to be hyphenated, and position was an array
where each entry corresponded to a letter in word, and a given entry
was set TRUE if the corresponding letter in the word could have a hyphen
placed _AFTER_ it - so hyphenate("hello", x) would set x[2] (the first
'l') and clear all other elements of x. This is my current interface,
if the code provided does something different then I'll figure out how
to adapt it.

C source is my first choice, I'll take pascal if pushed.

			Thanks in advance,
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com		  	  +---+

cck@deneb.ucdavis.edu (Earl H. Kinmonth) (08/18/89)

In article <654@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes:
>I am looking for code that will decide where English words should be
>hyphenated. I'm not too worried exactly what form of output it produces,
>since the application I'm writing can be modified to suit.

C CHEST AND OTHER C TREASURES FROM Dr. DOBB'S JOURNAL (M & T Books,
1987) contains such code (in C).  You can get a disk for $25.00 with
this and other items on it.

You might also check the items available from the C-Users Group.

dg@lakart.UUCP (David Goodenough) (08/21/89)

In article <654@lakart.UUCP> dg@lakart.UUCP (Who, Me?) writes:
>I am looking for code that will decide where English words should be
>hyphenated. I'm not too worried exactly what form of output it produces,
>since the application I'm writing can be modified to suit.

I have been told that TeX contains a really sharp hyphenation algorithm

1. Is the TeX source in the public domain?

2. If so, can someone E-mail me the relevant bits - I'll take C or Pascal,
	or anything else, it's all going to get translated to something
	else pretty ugly anyway ( Z80 assembler - now aren't you sorry
	you asked :-) )

	Thanks in advance (again)
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com			  +---+

ray@ole.UUCP (Ray Berry) (09/16/89)

    I am looking for c src code for rule-driven hyphenation of english
words.  Does anyone have something they could e-mail?  Donations, pointers-
all are encouraged/appreciated.  Thank you.
-- 
Ray Berry  kb7ht  uucp: ...ole!ray CIS: 73407,3152 /* "inquire within" */
Seattle Silicon Corp. 3075 112th Ave NE. Bellevue WA 98004 (206) 828-4422

matthew@sunpix.UUCP ( Sun Visualization Products) (09/27/89)

In article <1333@ole.UUCP> ray@ole.UUCP (Ray Berry) writes:
|
|    I am looking for c src code for rule-driven hyphenation of english
|words.  Does anyone have something they could e-mail?  Donations, pointers-
|all are encouraged/appreciated.  Thank you.
|-- 
|Ray Berry  kb7ht  uucp: ...ole!ray CIS: 73407,3152 /* "inquire within" */
|Seattle Silicon Corp. 3075 112th Ave NE. Bellevue WA 98004 (206) 828-4422


I've seen a book called 'C Chests and other Treasures', which are a reprint
of DDJ articles from the C Chests column.

One of the chapters in the book is dedicated to a rule-driven hyphenation
program


-- 
Matthew Lee Stier                            |
Sun Microsystems ---  RTP, NC  27709-3447    |     "Wisconsin   Escapee"
uucp:  sun!mstier or mcnc!rti!sunpix!matthew |
phone: (919) 469-8300 fax: (919) 460-8355    |

raymond@hilbert.berkeley.edu (Raymond Chen) (09/28/89)

In article <1333@ole.UUCP> ray@ole.UUCP (Ray Berry) writes:
|    I am looking for c src code for rule-driven hyphenation of english
|words.  Does anyone have something they could e-mail?  Donations, pointers-
|all are encouraged/appreciated.  Thank you.
|-- 
|Ray Berry  kb7ht  uucp: ...ole!ray CIS: 73407,3152 /* "inquire within" */
|Seattle Silicon Corp. 3075 112th Ave NE. Bellevue WA 98004 (206) 828-4422

If you're after perfection, look at appendix H of Knuth's TeXbook.  It
describes the hyphenation algorithm used by the TeX program (which is
in turn based on a Stanford Ph.D. thesis).  The algorithm itself is
really simple.  It misses only 14 of the commonly-used words in the
English language (4 of them being "present" "presents" "project" and
"projects", which can be hyphenated in two different ways, depending on
the context).  The TeX Users' Group (TUG) has a list of all known words
which the algorithm fails to hyphenate correctly.  (Trust me, the words
on the list are words you'd never use.  How often do you have to
hyphenate "Grothendieck"?)  In most cases, the only error in the
algorithm is that it misses hyphenation points.  It rarely places a
hyphen where there shouldn't be one.

Disclaimer:  This is from memory.  I hope you get the idea of what
	I'm saying (i.e., read Appendix H, and get a copy of the hyphen.tex
	file from somebody).  Any errors in this article are unintentional
	and were made in good faith.

lee@uhccux.uhcc.hawaii.edu (Greg Lee) (09/28/89)

From article <1989Sep27.235236.22920@agate.berkeley.edu>, by raymond@hilbert.berkeley.edu (Raymond Chen):
" ...
" If you're after perfection, look at appendix H of Knuth's TeXbook.
" ...  In most cases, the only error in the
" algorithm is that it misses hyphenation points.  It rarely places a
" hyphen where there shouldn't be one.

I'll give an algorithm that is even more perfect than this: Don't
hyphenate.  Then you _never_ place a hyphen where there shouldn't
be one.

To put the matter more straightforwardly, TeX's hyphenation
misses perfection by a considerable margin, since it misses
many good hyphenation points.

			Greg, lee@uhccux.uhcc.hawaii.edu