[comp.edu] Text analysis

dsims@uceng.UC.EDU (david l sims) (10/26/90)

sinowitz@pilot.njin.net (Jonah Sinowitz) writes:

>	I am interested in finding a program to help me with
>some textual analysis and indexing. My task is to index a
>few books -- all of which are on my ATT3B, in ascii.

Jonah,

Awhile back I wrote a program that creates exhaustive concordances
for plain ASCII text files. The output of my program consists of
an entry for each distinct word in the document and sub-entries for each
entry that list the page and line number of the occurrence of the word,
followed by the word used in context.

For example, consider the following text.

Mary had a little lamb.
It's fleece was white as snow.
And everywhere that Mary went.
The lamb was sure to go.

My program concorded the above text and produced the following output.

EVERYWHERE
1:3 And e. that Mary went.		/* page 1, line 3 */
FLEECE
1:2 It's f. was white as
GO
1:4 was sure to g..
HAD
1:1 Mary h. a little lamb.
IT'S
1:2 I. fleece was white
LAMB
1:1 had a little l..
1:4 The l. was sure to
LITTLE
1:1 Mary had a l. lamb.
MARY
1:1 M. had a little
1:3 And everywhere that M. went.
SNOW
1:2 was white as s..
SURE
1:4 The lamb was s. to go.
WENT
1:3 everywhere that Mary w..
WHITE
1:2 It's fleece was w. as snow.

Unfortunately, the program runs only on MS-DOS machines. I would need a bit
of time to port it to Unix, which is what I assume you are running.
Moreover, the program is commercial. I sell it for $25.

If you have any questions or comments, send me some mail.

DISCLAIMER: Yes, I wrote the program, and I would profit from sales
of this product.

David Sims
dsims@uceng.uc.edu