[comp.text] incorrect hyphenation in

fitz@mml0.meche.rpi.edu (Brian Fitzgerald) (01/24/91)

Is there any way to predict incorrect hyphenation in troff?

How about a shell script (a little like spell) that compares a "stop
list" to the text and perhaps prints warnings or generates ".hw"
requests?

I am currently using the troff that came with SunOS 4.0.3.

Brian Fitzgerald

npn@cbnewsl.att.com (nils-peter.nelson) (01/25/91)

In article <MJ7^8$@rpi.edu>, fitz@mml0.meche.rpi.edu (Brian Fitzgerald) writes:
> Is there any way to predict incorrect hyphenation in troff?
> 
> How about a shell script (a little like spell) that compares a "stop
> list" to the text and perhaps prints warnings or generates ".hw"
> requests?
> 
> I am currently using the troff that came with SunOS 4.0.3.
> 
> Brian Fitzgerald


Sorry, but not so easy. The source code has a 19,000 byte table
of suffixes, and there is a binary encoded digram table.
Neither is user-accessible in the binary code.  We test the
algorithm by forcing the line length to something wee (.2 inches),
run a dictionary through it, and compare to a previously
hyphenated dictionary.
For DWB 3.2 we are adding the TeX hyphenation algorithm as a
user-selectable alternative. Seems to be more accurate, if
slower and larger, but may not be any more predictable.
If it's any consolation, either algorithm does better
than most people.

jaap@mtxinu.COM (Jaap Akkerhuis) (01/29/91)

In article <1991Jan24.223645.16630@cbnewsl.att.com> npn@cbnewsl.att.com (nils-peter.nelson) writes:
 > In article <MJ7^8$@rpi.edu>, fitz@mml0.meche.rpi.edu (Brian Fitzgerald) writes:
 > > Is there any way to predict incorrect hyphenation in troff?
 > > 
 > > How about a shell script (a little like spell) that compares a "stop
 > > list" to the text and perhaps prints warnings or generates ".hw"
 > > requests?
 > > 
 > > I am currently using the troff that came with SunOS 4.0.3.
 > > 
 > > Brian Fitzgerald
 > 
 > 
 > Sorry, but not so easy. The source code has a 19,000 byte table
 > of suffixes, and there is a binary encoded digram table.

Yes, it is not easy to predict the hyphenation of old troff, but if
you have source, it is ot difficult to rip the hyphenation algorithm
out of the program.

Note the the hyphenation exception list (.hw) is about 128 bytes
small, so it overflows quickly. If you don't like the way troff
hyphenates for you and don't want to wait (or don't care) for the support
of Liang's algorithm, you can do it yourself.

I've seen a program which hypnenated dutch text. It was used as a
preprocessor for troff and inserted the hypnenation character in every
legal place it could fine. This way the text wasn't hyphenated with
the build in stuff.

So, using this method, one can always forcing troff to hyphenate
where you want while ignoring the builtin rules.

	jaap