[comp.lang.c] Information Retrieval Algorithms Wanted

mpledger@cti1.UUCP (Mark Pledger) (02/18/91)

Sorry for the wide cross-posting but the following request is applicable
to a number of the news groups.

I am in the midst of doing Information Retreival (IR) research in a parallel
environment.  I would appreciate any information leading to algorithms
for word stemming and/or thesaurus searching.  Possible information includes
technical journals, research papers, source code, etc.  I would prefer either
pseudo or source code, however I will extract the algorithms from written 
papers if needed.

Word stemming involves paring off the prefix or suffix of a word reducing it
to the most basic form.  For example, taking the word "running", and passing
it through the word stemmer results in the output word of "run". 

Thesaurus algorithms have the ability to search for a set of similar words
based upon the input word given (i.e., thesaurus capabilities in many word
processors).

If you can help me I'd appreciate any and all responses.  I will summarize
if enough interested is generated.  Thank you.



-- 
Sincerely,


Mark Pledger

--------------------------------------------------------------------------
CTI                              |              (703) 685-5434 [voice]
2121 Crystal Drive               |              (703) 685-7022 [fax]
Suite 103                        |              
Arlington, VA  22202             |              mpledger@cti.com
--------------------------------------------------------------------------

davidsen@sixhub.UUCP (Wm E. Davidsen Jr) (02/21/91)

This is not about anything which has been or might be posted to c.b.i.p!
Please do not post replies, and if you must communicate with the original
poster, please mention that this is the wrong group.
-- 
bill davidsen - davidsen@sixhub.uucp (uunet!crdgw1!sixhub!davidsen)
    sysop *IX BBS and Public Access UNIX
    moderator of comp.binaries.ibm.pc and 80386 mailing list
"Stupidity, like virtue, is its own reward" -me