[comp.ai] looking for work on text `interest score' computation

eric@snark.uu.net (Eric S. Raymond) (07/24/89)

Please help me save USENET from uselessness! ;-)

I'm looking for methods for filtering news articles for `interest level'
according to preferences expressed by the user, sort of a generalization
of the kill-files concept.

The idea is to use such software as a noise filter. This will be useful for
the present USENET, and become even more important  as I turn USENET into
a distributed-hypertext system.

I would like to hear about any such work, whether statistical or knowledge-
based and *no matter how slow the method is or what language it's in!*

Code that is available and useful will be ported into the TMNN netnews suite and
made available for public redistribution under copyleft.

Thanks in advance...
-- 
      Eric S. Raymond = eric@snark.uu.net    (mad mastermind of TMN-Netnews)

david@banzai.UUCP (David Beutel) (07/26/89)

In article <1SDNCv#4Gg1CL=eric@snark.uu.net> eric@snark.uu.net (Eric S. Raymond) writes:
>Please help me save USENET from uselessness! ;-)
>
>I'm looking for methods for filtering news articles for `interest level'
>according to preferences expressed by the user, sort of a generalization
>of the kill-files concept.
>
[...]
>I would like to hear about any such work, whether statistical or knowledge-
>based and *no matter how slow the method is or what language it's in!*

I think the statistical approach would be the best, because by the
complexity of natural language a knowledge-based system couldn't hope
to understand a news article (thus no win for the extra complexity of 
programming a logical system).

Specifically, I think a neural network of some type would work best.
The reader would train his/her filter by rating each article s/he reads.
The rating could be a simple 0..9 of how interested the reader was in
the article, or several different catogories (content, style, subject...)
could be rated.

The network would train itself by taking the article as the input vector, 
producing the rating it thinks the reader would give it, comparing its
rating to the real rating the reader gave, and adjusting itself
until the rating it produces is the same as the real rating.  After
a learning period, the NN would provide the reader with an acurate
rating--a forcast of how the reader would rate an article.  When the NN is 
wrong, the reader can give it the correct rating--so, if the reader's 
tastes or interests change, the NN changes too.

The benefit of the system is that it would be objective and passive,
as opposed to a keyword system which makes demands upon the author.
It would also be personalized to the reader, as opposed to a moderator
who, albeit incomprably smarter than a NN, may not share the individual
reader's tastes.  In a sophisticated newsreader, the NN could show
the score near the subject and keywords of an article, and the reader
could use this advice when deciding what to read.  The newsreader
could also mask articles that have a score less than 5, for instance.

I don't know how to make such a neural network system--there are 
two big questions that I don't know how to answer:

	1) How should the articles be pre-processed for vectorized input
		to the network?  I.e., what's the best way of looking
		at an article to producing a rating for it?  This
		involves producing statistics as well as passing some 
		text straight thru.

	2) What sort of neural network is best for this?

I hope someone else will speculate!  Maybe someone out there has
tried preparing text for generating statistical ratings, especially
by neural network?
-- 

J. David Beutel_______________11011011________________People's Computer Company
"I am, therefore I am."                             `Revolutionary Programming'
                     ...!uunet!uvm-gen!banzai!david