henry@utzoo.UUCP (Henry Spencer) (07/07/87)
> A few years ago I, and some others were arguing fairly strenuously that > some kind of keyword based news reader was required to cut down on the > amount of chaff you have to search through to find the odd kernel of wheat. > In the end, the discussion went the way of the Dodo... Well, there were reasons for that. I and some others were counter-arguing fairly strenuously that keyword-based news readers will not work unless the keywords are well-chosen, which they wouldn't be. The successful keyword- based retrieval services maintain tight central control over their keyword list and often use experts to assign keywords to new material. There is just no way to make that work on Usenet. What's more, the few studies that have been done on retrieval efficiency show that users *think* they are getting 80% or so of relevant material, while the real number is more like 25%. That is, even well-run keyword-based systems show you only about one in every four kernels of wheat. It's pretty, but it just don't work. -- Mars must wait -- we have un- Henry Spencer @ U of Toronto Zoology finished business on the Moon. {allegra,ihnp4,decvax,pyramid}!utzoo!henry
brad@looking.UUCP (07/07/87)
In article <8262@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes: >Well, there were reasons for that. I and some others were counter-arguing >fairly strenuously that keyword-based news readers will not work unless the >keywords are well-chosen, which they wouldn't be. The successful keyword- >based retrieval services maintain tight central control over their keyword >list and often use experts to assign keywords to new material. There is >just no way to make that work on Usenet. > >Mars must wait -- we have Henry Spencer @ U of Toronto Zoology My original proposal of K news (must be almost 5 years ago now!) did suggest user generated keywords. That idea comes from a smaller net, and Henry has a point that it might not work well now. What if we take the other features of K news but require some central authority (a "moderator?") for the creation of keywords. Right now there are two reasons to NOT create a newsgroup: 1) The news software did not envision so many groups, so there are hard memory limits on many machines on the number of groups in the active file 2) The net might get too confusing with too many groups 3) People aren't interested enough in the idea to warrant spending money sending the stuff all over the world (Note that "volume would be too low" is not a reason at all. In fact, it's an anti-reason. The lower the volume in a group the better. Today's high-volume groups are useless to most people. They just don't have time to wade through them.) K news was designed to get rid of reason #1. With good reason, for the high volume groups that are the result of reason #1 cost everybody a lot of money, and waste a lot of time for the people who read them. Reason #2 can't be solved well with software. It is a trade-off we must pay. The more specific news classification is, the harder it is to comprehend it all. The less specific it is, the noisier groups get with random postings and other crap. You don't want a net with only one group called "misc" and you don't want 20,000 groups either. Reason #3 was solved by the use of K news's powerful subscription file as a distribution file. Allowing convenient site subscription, minor keywords could be set up to limit distribution to only those sites with readers. This would make distribution MORE efficient than a mailing list. ------------- So other than the S.M.O.P. involved, why not K news? It solved (5 years ago) just about every major problem we have today. Because of the lack of software restrictions, the keyword creation moderators would not have to be particularly controversial. Instead of asking "why create this group?" the question would be "why not?" Keyword moderators would most ensure that keywords followed a good pattern, and that keyword association dictionaries were kept up to date. (Most people on this list have seen the K news plan, so I won't post it unless I get a lot of requests.) -- Brad Templeton, Looking Glass Software Ltd. - Waterloo, Ontario 519/884-7473
taylor@hplabsc.UUCP (07/07/87)
As a side note, I hacked up a newsreader that is based purely on keywords to see what it would be like. It took all the words in the Subject: Summary: and Keywords: lines, 'uniq'ed them and removed 'noise' words (e.g. the, and, a, <nf>, etc) and then logged 'em in a file as they arrived on the machine. Then the intrepid user would say "I want to read news about x, y, and z" and be shown the news *independent of what groups they were posted in*. I used it for a few days and found that it worked QUITE well and that the biggest problem I could see was that it would become very difficult to figure out what group(s) to post a completely new article to since users of this knews system would unlearn the distinction between newsgroups. This isn't good because the program and users would have to live in harmony with the rest of the net... A fun experiment showing that my theories that the concept of grouping articles by a small number of newsgroups is indeed as archaic as it seems and that I found articles and discussions in groups that I had never even read because they were indeed keyworded (see above) correctly. And as to the stuff that isn't keyworded correctly, well, if you think about it, as more and more people were to use a system of this nature the articles would become better and better keyworded since if you are going to go to the trouble of WRITING an article, you certainly want to make sure that the maximal number of people READ it, right? (this can be helped by some decent frontend software too - stuff that allows the user to edit the subject line, prompts for a "summary line", and perhaps does a crude first pass automatic keyword list). The key is that it is a lot easier for people to modify something than create it, typically. *sigh* I can imagine the hostile remarks this posting is going to generate. We've had, as people have pointed out, this discussion before. A great number of schemes have been proposed to the net, including this keywording, Webbers' multiple moderators, Fairs' accolades, and such, and somehow we keep ending up with these artificial newsgroup boundaries, articles that are more likely to be cross-posted than not, and discussion threads that are doomed to follow the 'base note' regardless of if we are still on topic or not...it's always the lowest common denominator. Maybe there's a lesson to be learned in all this?? Anyway, for what it's worth...I shall attempt to find a few free evenings and get my knews reader up to a sufficient state to allow me to post it to net.sources (errr, to whatever group is appropriate, since It Is Obvious that Unmoderated Groups are Evil (even though I have proposed a scheme to alleviate the problems cited with the old unmoderated newsgroups)). *sigh* From the far corners of the universe, -- Dave Taylor
allbery@ncoast.UUCP (Brandon Allbery) (07/11/87)
As quoted from <2185@hplabsc.UUCP> by taylor@hplabs.HP.COM (Dave Taylor): +--------------- | As a side note, I hacked up a newsreader that is based purely on keywords | | And as to the stuff that isn't keyworded correctly, well, if you think | about it, as more and more people were to use a system of this nature | the articles would become better and better keyworded since if you are | going to go to the trouble of WRITING an article, you certainly want to | make sure that the maximal number of people READ it, right? (this can +--------------- From experience: Someone may, in an article keyworded to A, B, C, and D, make a reference to E which is so minor as to not deserve keywording... until it turns out that that reference answers another person's question, but that person never gets to see it on a keyword search for E. In fact, you can change "may" to "will"; it happens all the time. The only way I see to get keywords working is to potentially use every word in an article (both header and body) that is not a syntactic particle as a keyword, after standardizing case and attempting to deal with spelling and prefixes/suffixes. This doesn't strike me as being very fast, space con- servative, or (without either a better AI program than we've got or a (horrors!) moderator choosing the keywords) likely to be correct. (And even the moderator can mess up.) Of course, omitting syntactic particles makes it difficult to find the article in what is now soc.lang.english (if there is such; I haven't checked) on uses of the word "the".... +--------------- | and get my knews reader up to a sufficient state to allow me to post it | to net.sources (errr, to whatever group is appropriate, since It Is Obvious | that Unmoderated Groups are Evil (even though I have proposed a scheme to +--------------- comp.sources.misc The problems with any unmoderated scheme are amply demonstrated by the bogus posting by richard@bigtuna.UUCP of a month back. It doesn't matter WHAT you do, people will scream bloody murder if they can't use net.sources as comp.sources.d. (There were more discussions in net.sources than in comp.sources.d during its final two weeks, even ignoring the discussions about net.sources becoming moderated. How do I know? Erik Fair jumped the gun and my mailbox was suddenly filled with 15 duplicate copies of every message sent to net.sources. Once I eliminated the duplicates, the amount of non-source in net.sources was absolutely disgusting.) I, too, will try to find time to implement the keyword scheme I discussed above: I'm interested in seeing how bad it really is. I hate to imagine the keyword database, though.... -- [Copyright 1987 Brandon S. Allbery, all rights reserved] \ ncoast 216 781 6201 [Redistributable only if redistribution is subsequently permitted.] \ 2400 bd. Brandon S. Allbery, moderator of comp.sources.misc and comp.binaries.ibm.pc {{ames,harvard,mit-eddie}!necntc,{well,ihnp4}!hoptoad,cbosgd}!ncoast!allbery <<The opinions herein are those of my cat, therefore they must be correct!>>
allbery@ncoast.UUCP (Brandon Allbery) (07/11/87)
It has occurred to me that I should provide an example of my assertation that: As quoted from <2855@ncoast.UUCP> by allbery@ncoast.UUCP (Brandon Allbery): +--------------- | From experience: Someone may, in an article keyworded to A, B, C, and D, | make a reference to E which is so minor as to not deserve keywording... | until it turns out that that reference answers another person's question, | but that person never gets to see it on a keyword search for E. In fact, | you can change "may" to "will"; it happens all the time. +--------------- Macro-example (bad keywording): Find, in the permuted index for the sVr2 manuals, the man page for ftok(). (Hint: the word "ftok" is not in the permuted index at all.) Micro-example (small missed reference; assume, for this case, that keywords are cross-referenced via the "SEE ALSO" section of a manpage): Find the reason that a program under System V which wait()s for a child can act strangely under multiprocess adb (Plexus). (Hint: there's no reference to ptrace(2) in wait(2). With keywords, selecting the right ones to look up articles can require you to know the answer already beforehand.) [All right, this one's slightly unfair; I don't know that multiprocess adb is in other variants of System V; but it's a clean miss; and wait(2) STILL doesn't point to ptrace(2). If anyone's interested, if you set a break- point in the child, the parent might return from the wait() with a status of Stopped under certain conditions.] -- [Copyright 1987 Brandon S. Allbery, all rights reserved] \ ncoast 216 781 6201 [Redistributable only if redistribution is subsequently permitted.] \ 2400 bd. Brandon S. Allbery, moderator of comp.sources.misc and comp.binaries.ibm.pc {{ames,harvard,mit-eddie}!necntc,{well,ihnp4}!hoptoad,cbosgd}!ncoast!allbery <<The opinions herein are those of my cat, therefore they must be correct!>>