eric@snark.uu.net (Eric S. Raymond) (07/24/89)
Please help me save USENET from uselessness! ;-) I'm looking for methods for filtering news articles for `interest level' according to preferences expressed by the user, sort of a generalization of the kill-files concept. The idea is to use such software as a noise filter. This will be useful for the present USENET, and become even more important as I turn USENET into a distributed-hypertext system. I would like to hear about any such work, whether statistical or knowledge- based and *no matter how slow the method is or what language it's in!* Code that is available and useful will be ported into the TMNN netnews suite and made available for public redistribution under copyleft. Thanks in advance... -- Eric S. Raymond = eric@snark.uu.net (mad mastermind of TMN-Netnews)
david@banzai.UUCP (David Beutel) (07/26/89)
In article <1SDNCv#4Gg1CL=eric@snark.uu.net> eric@snark.uu.net (Eric S. Raymond) writes: >Please help me save USENET from uselessness! ;-) > >I'm looking for methods for filtering news articles for `interest level' >according to preferences expressed by the user, sort of a generalization >of the kill-files concept. > [...] >I would like to hear about any such work, whether statistical or knowledge- >based and *no matter how slow the method is or what language it's in!* I think the statistical approach would be the best, because by the complexity of natural language a knowledge-based system couldn't hope to understand a news article (thus no win for the extra complexity of programming a logical system). Specifically, I think a neural network of some type would work best. The reader would train his/her filter by rating each article s/he reads. The rating could be a simple 0..9 of how interested the reader was in the article, or several different catogories (content, style, subject...) could be rated. The network would train itself by taking the article as the input vector, producing the rating it thinks the reader would give it, comparing its rating to the real rating the reader gave, and adjusting itself until the rating it produces is the same as the real rating. After a learning period, the NN would provide the reader with an acurate rating--a forcast of how the reader would rate an article. When the NN is wrong, the reader can give it the correct rating--so, if the reader's tastes or interests change, the NN changes too. The benefit of the system is that it would be objective and passive, as opposed to a keyword system which makes demands upon the author. It would also be personalized to the reader, as opposed to a moderator who, albeit incomprably smarter than a NN, may not share the individual reader's tastes. In a sophisticated newsreader, the NN could show the score near the subject and keywords of an article, and the reader could use this advice when deciding what to read. The newsreader could also mask articles that have a score less than 5, for instance. I don't know how to make such a neural network system--there are two big questions that I don't know how to answer: 1) How should the articles be pre-processed for vectorized input to the network? I.e., what's the best way of looking at an article to producing a rating for it? This involves producing statistics as well as passing some text straight thru. 2) What sort of neural network is best for this? I hope someone else will speculate! Maybe someone out there has tried preparing text for generating statistical ratings, especially by neural network? -- J. David Beutel_______________11011011________________People's Computer Company "I am, therefore I am." `Revolutionary Programming' ...!uunet!uvm-gen!banzai!david