[net.sources] Keyword News proposal, see net.news

bstempleton@watmath.UUCP (Brad Templeton) (10/30/84)
URFC 002                                              K NEWS


                Keyword News System Proposal
                             By
                       Brad Templeton
               Looking Glass Software Limited
                    (brad@looking.uucp)

               USENET Request For Comment 002



     For some time people have  been  using  a  news  system
based on newsgroups.  This is a short outline of my proposal
for a news system based on a classification system I  called
keywords.  The only essential difference between a newsgroup
and a keyword is that the Keyword news system (or K news) is
designed  so  that  there  is a very small overhead for each
keyword.   It  is  thus  possible  to  have  thousands   and
thousands of active keywords with little overhead.

     It is my feeling that  several  problems  have  emerged
with  the  old  newsgroup  style  system.   Many of them are
solved by K news.

(1)  Due to the limited number of groups, there is  a  great
     deal  of  traffic  concerning  what  articles belong in
     which groups  and  whether  certain  groups  should  be
     created  or  destroyed.  Under K news, there is no such
     discussion.  If you want a new keyword, you create  it.
     If you want to use a name that is long and descriptive,
     you can.  If discussions go under several keywords,  it
     is easy to add them to your list.

(2)  The limited number of groups also creates  groups  like
     "net.misc"  and  "net.general"  which  are difficult to
     work with.  K news eliminates the need for net.misc and
     allows easy renaming of net.general.

(3)  Current systems only allow an "or"ing  of  groups  when
     dealing  with multiple groups.  In K news, it is possi-
     ble to request articles that deal with a  set  of  key-
     words.   ie. one can ask to be shown only articles that
     contain both the  "science  fiction"  keyword  and  the
     "movie" keyword.

(4)  Current systems do not allow grouping all followups  to
     a given article together, or sorting articles according
     to posting date.  K news provides this because it  uses
     sort(1) on the complete list of articles to be seen.

(5)  Current news systems are slower  than  they  should  be
     because  they  must  scan  each  newsgroup a users sub-
     scribes to to see if there is  news.   Knews  does  not
     have this problem.


Brad Templeton                                             1







URFC 002                                              K NEWS


(6)  Current systems just don't allow users to be  selective
     enough in filtering news efficiently.  There's just too
     much volume, and secretary programs  and  the  "n"  key
     aren't  enough.  By providing keywords, we get an extra
     level of selectivity in reading news.

(7)  Current news systems have difficulty  in  showing  each
     article  only once to a given user, particularly if two
     different news reading sessions are  involved.   The  K
     news implementation scheme I suggest does not encounter
     this problem.

_1.  _T_h_e _K_e_y_w_o_r_d _E_n_v_i_r_o_n_m_e_n_t

     K news can solve the B news  problems  by  promoting  a
different environment with keywords.  First of all, the dis-
tribution of an article is taken out of  the  keyword  name.
This  means  all  keywords are valid over all distributions.
The fact that there is an "auto" keyword means you can  post
an  auto article to netwide, statewide or even local distri-
bution.  This should  cut  down  on  the  number  of  people
advertising  their  cars to "net.auto" because the only auto
group has netwide distribution.

     An article will have several keywords.  The K news sys-
tem  will  probably  insist  on members from certain sets of
keywords be there.  For example, there should be a distribu-
tion  keyword  with  any  article  that is not local.  There
might be a "followup"  keyword  on  any  followup,  although
these  can be detected from their "References" string.  Key-
words like "spoiler" and "flame" can be put with articles so
that people can request not to see them.  (Ridiculous groups
like net.flame go away.)

     It seems that all articles seem to fall into a  certain
set of classes.  These classes are "query", "original infor-
mation", "reprint", "opinion"  and  "followup".   There  are
some  sub-classes,  such  as "flame" (a type of opinion) and
"source code" (a type of original information).  It might be
a  good idea to insist that all posters provide one of these
keywords, with the followup keyword being automatic.  Thus a
reader can shut off all queries or all opinion articles.

     Groups like "net.misc" will no longer be  needed.   Any
new discussion can easily rate a new keyword, from "big mac"
to "socks in hyperspace".  The group "net.general" is  still
a  bit  of  a problem, but it can now be replaced with some-
thing like "announcement for all users", and there  will  be
very  little  implicit cry to put the article in the netwide
distribution.  There will still be problems, but  they  will
be reduced.

9

9Brad Templeton                                             2







URFC 002                                              K NEWS


     It's also possible that we will still get a lot of  the
"You  posted  that  to the wrong keyword" type stuff.  It is
hoped that since adding and deleting keywords in a subscrip-
tion  list  will be quite easy, people will not complain too
much about this.  Even so, it is still  possible  that  some
utility  to  help  users  select  keywords will be required.
Each site will keep all known keywords in a DBM  type  file.
(This  will be the total overhead for each keyword.) The DBM
file entry might contain who first used the keyword,  a  one
line  entry  describing it, and its newsgroup mapping on a B
to K interface system.  A simple utility might scan a user's
article  for  any  of the keywords that occur in the text of
the article and suggest them as possible entries.  In  addi-
tion,  if  the  user  suggests a new keyword when posting an
article, a search for keywords that the new one could be  an
incorrect spelling of would be in order.

     Since the keywords are the  important  thing  that  get
copied  over in a followup, the subject line will not remain
the same.  One current problem under B news is that you  get
discussions  that  wander under the same subject line.  This
subject soon becomes meaningless.   Any  followup  generated
with  K  news  will have an entirely new subject line, since
both the keywords and the References string will provide  an
indication of what is a followup to what.

_1._1.  _T_y_p_e_s _o_f _K_e_y_w_o_r_d_s

     Most keywords  will  be  user  generated.   Stuff  like
"microcomputer",  "trs-80",  "space  shuttle", "frank zappa"
and  "homosexuality".   Others  will  be  system  generated.
These  are  keywords  that  apply to distribution, sites and
such things.  These keywords will all have an  "="  sign  in
them  for matching purposes.  "distribution=usa" would match
articles with usa in the distribution field.  "site=looking"
would catch articles from Looking Glass Software.

     All keywords when processed by the system will  be  set
into  lower  case,  and  all sections of white space will be
mapped to a single space.  An "s" on the end  of  a  keyword
will not be important in comparison so we don't have worries
about pluralization.  Keywords will be sorted into alphabet-
ical  order  inside the article so that the same set of key-
words is always identical when compared.




_2.  _T_h_e _K _n_e_w_s _i_m_p_l_e_m_e_n_t_a_t_i_o_n

     To develop a keyword based system, we need a  different
implementation  scheme than the one use for B news.  In par-
ticular, keywords must have minimal overhead associated with


Brad Templeton                                             3







URFC 002                                              K NEWS


them.    Things like an entire directory and a line in every
.newsrc file for each keyword can't be used.

     One of the facts that can be used in a new  implementa-
tion is that the average news reader normally reads only the
news that has arrived  since  news  was  last  read.   Thus,
instead  of  scanning  directories and keeping track of what
has been read, K news scans a history file and  keeps  track
of  what has NOT been read.  In a given session, the history
file is scanned from the point in time when  news  was  last
read.   In  addition,  a  file of articles not read from the
previous session is scanned.  The user may  request  to  see
the  old  articles first, or to have them merged in with the
newer ones.  Finding out what to read is a simple matter  of
scanning a few files and should be quite fast.

     I have set out some ideas for implementing the  K  news
system.   The idea breaks the news software into a series of
simple,  efficient  modules.   This  scheme  could  also  be
applied  to other news systems.  I will present a brief sum-
mary of the modules with more details further on.

(1)  The "inews" program  takes  articles,  stores  them  in
     files  and  writes  out history records describing each
     article.  One history file is kept per day.

(2)  The subscription filter program grabs a list  of  arti-
     cles  not  yet seen from the history files, and matches
     them against the user's subscription file.   It  writes
     out a file containing a list of articles the user wants
     to see according to the subscriptions.

(3)  Any standard sort  program  sorts  the  output  of  the
     filter  program  to provide a list of articles the user
     wants to see, in the order the user wants to see them.

(4)  A variety of user interface programs read in  the  list
     of  articles  to  see, and presents these articles in a
     way the user likes.   Most of the work is already done,
     so there can be several of these.

(5)  Various utilities for use by  user  interface  programs
     will exist, including joke decryption, following up and
     subscription file management.  A special utility  would
     exist  to take all the articles to be seen and put them
     into a "batch" for  sending  to  other  systems.   Thus
     other systems become just like users, with subscription
     files.

     Here follows more detail.


9

9Brad Templeton                                             4







URFC 002                                              K NEWS


_2._1.  _R_e_c_e_i_v_i_n_g _P_o_s_t_e_d _a_n_d _T_r_a_n_s_m_i_t_t_e_d _N_e_w_s

     The "inews" equivalent of K news should be quite simple
to  implement.   When an article comes it, all it need to is
place the article in a file somewhere (it could even  let  B
news  do this for it) with possible header processing.  Once
the article has been placed, a header record must be written
out  to  the  K  news history file for that date, describing
various header attributes of the article and  what  file  it
was  put  in.  It is not necessary that there be a transmis-
sion mechanism if batching of news is  intended.   It  would
still be possible to include one, however.

     As noted above, the article can be put in a file  by  a
special  program  that  returns  the name of the file.  This
puts the operating system related things in  a  simple  pro-
gram, and makes the system more portable.  Whatever the pro-
gram is that places the article in a file, the  filename  is
passed to the K news pickup program.  This program will take
the article, and examine the header.  Important  information
about the article will be written to a special history file.
This will include the keywords associated with the  article,
the "References:" string of the article plus its message-id,
the date of posting, the pathname of the file containing the
article  with  optional seek address and length, and finally
the subject line.  Note, by the way, that in the case  of  a
followup,  any  extra keywords that were not in the original
article will have to be placed in an extra field so they are
not  involved in the sorting that groups articles with their
followups.

     History files will be  maintained  on  a  one  per  day
basis,  in a special history directory.  Each history file's
name will be formed from the date  for  that  history  file.
(Perhaps  in  days since the birthday of the net, or perhaps
in the form yymmdd.)  There may be a new history  file  each
day,  each week, or even every hour as the site requires.  K
inews will query the date  and  time  from  the  system  and
decide which history file to append to.

_2._2.  _N_e_w_s _r_e_a_d_i_n_g _s_t_a_g_e _o_n_e - _h_i_s_t_o_r_y _f_i_l_t_e_r

     The first stage of any  news  reading  is  the  history
filter  that  is  common  to all news reading and collecting
programs. This program first notes the last  time  the  user
read  news  and  finds  the  appropriate spot in the history
files.  This list of articles in the history  file  is  com-
bined (if the user has requested it) with a special per-user
list of articles that have already been processed, but which
the  user  has decided to read later.  (As this user file is
already in the proper order, the merging may  actually  take
place later to be more efficient.)
9

9Brad Templeton                                             5







URFC 002                                              K NEWS


     Now the system has a list of possible articles to read.
It  must  decide  which  ones  the user wants to see.  To do
this, we use a user created "subscription file".  This  file
contains  a  list  of keyword patterns describing the user's
taste in articles.  The subscription file  is  read  in  and
parsed  into  a tree.  As will be described the subscription
file contains keywords and keyword  patterns  that  will  be
matched  against  articles.   Each  pattern is given a "sort
value" that indicates how important the associated  keywords
are.  This sort value may either explicit, or derived impli-
citly from the order of the  subscription  file.    Articles
will  be  shown  in  the order dictated by the sort value of
their keywords, so users can direct  the  order  that  their
news will be seen in.

     Lines from the history file are read  in,  and  matched
against the subscription list.  If they match, the appropri-
ate line is written out onto a temporary file.  Matching can
be  done  on  keywords  or  other  information,  such as the
article-ids in followup  chains,  the  poster,  the  posting
site,  the  distribution  and  anything  else that is imple-
mented.  It is important to note that the ability  to  match
on  article-ids  allows users to request or shut out discus-
sion chains based on followups.  Instead of writing out  the
keywords  to  this  file,  we  write instead the sort values
given to each keyword.  These  sort  values  are  themselves
sorted  on  the line before being written.  The old keywords
are also output, but not for the purposes of sorting.

     Once the new  file  is  prepared  it  is  sent  off  to
sort(1),  possibly with the file of previously skipped arti-
cles appended to it.  The first sort key is the keyword sort
values.  Since  followups  all  have the same base keywords,
they will match as equal in the first sort  key.   Since  we
are sorting by the keyword sort values, the output file will
have the articles sorted by  keywords  in  the  presentation
order  the user requested.  The next sort key is the "Refer-
ences" chain, which includes the message-id of  the  article
if it is an original article.  Thus all followups to a given
article are sorted in a nice tree.

     Other information output includes the date of  posting.
While  we want to sort on this date for articles at the same
"level" (on a followup basis), it  is  impossible  for  most
sort  programs  to  do this.  This sorting must be done in a
second pass (it's fairly simple) or right  within  the  user
interface program.

     Any amount of additional information can be  output  to
this  file.  In theory, most of the header information could
be written (lines would start getting pretty long)  so  that
the  user  interface  program need not even open the article
file for articles a user says "no" to.  This is a  trade-off


Brad Templeton                                             6







URFC 002                                              K NEWS


to  be worked out.  One important item that has to be there,
of course, is the name of the file where the  article  actu-
ally resides.

     Once sort is called we will have a file which  has,  in
addition  to  a  lot of garbage, a list of pathnames for the
articles the user wishes to read.   The  keywords  on  these
articles may also be present.  This is passed to phase two.

_2._3.  _D_a_t_e & _D_i_s_c_u_s_s_i_o_n _S_o_r_t_i_n_g

     A special pass may be used to sort by the date within a
discussion,  since  many  will  want this.  This is a simple
task that can be left to the user interface  phase,  but  it
could  also  be  done  in general for anybody to use it.  It
would be slower this way, since a whole extra pass would  be
required.

_2._4.  _U_s_e_r _I_n_t_e_r_f_a_c_e

     User interface programs will vary from  being  dumb  to
quite fancy.  Since it gets passed a readymade list of arti-
cles, there is not much work to do.  All a simple  one  need
do  is  go through the list, and doing what msgs or readnews
currently does to each file.   These  programs  will  handle
replies,  followups etc.  Special utilities will be provided
for cancelling etc.

     When a user skips an article, the program can write the
appropriate line to the unread article file noted above.  It
is hoped the average user will not let  this  file  get  too
big.   More sophisticated programs will keep track of a list
of seek addresses in the sort output file that mark articles
that  have  not  been  read, and output this at the end of a
session.  This allows programs to allow users to  skip  back
and  forth  among  the articles since the information is not
written out until the end.  In fact, it might  be  a  useful
utility to provide for writers of user interfaces.

     User interfaces can get quite fancy, with  screen  sys-
tems  like notesfiles and rn.  It would be nice to provide a
feature so that unrecognized  commands  are  passed  to  the
shell  with a search path list including a special directory
for news commands.  (Perhaps an environment variable so  the
user  can  specify.)   In the news command directory you put
simple commands like "decrypt" and "undigest" with appropri-
ate  short  names.   It is expected that several user inter-
faces will be written, including one just like  B  news  and
one  just  like notesfiles.  All interfaces to the subscrip-
tion file by the user interface  program  should  be  though
other programs that are part of phase one if possible.  This
keeps things apart.
9

9Brad Templeton                                             7







URFC 002                                              K NEWS


_2._5.  _B _a_n_d _K _n_e_w_s _I_n_t_e_r_f_a_c_e _a_n_d _T_r_a_n_s_i_t_i_o_n

     In the design of K news, we can plan for three  schemes
of  usage.  One is to design K news without paying attention
to any other news systems.  This would require creation of a
totally  new  net  that  won't talk to newsgroup based nets.
This would be slow, but has the appeal that it would  create
a  net  that  wasn't bogged down the way the current one is.
This "let them stew in their mess" attitude is a bit  snobby
though, and could create a lot of problems in getting K news
accepted.  Another thing to consider is that there is a high
probability  somebody  will put together some kind of inter-
face between systems that is jury-rigged and  far  from  the
best.   This  happened  with the Notes-B news interface, and
created a royal mess that was worse than the  problems  that
would have resulted from working together on things.

     Another scheme is to make a system that  can  interface
to  B news, but doesn't plan to for long.  The idea would be
that if K news were good enough, everybody would  eventually
switch and we would have a new pure system.  In the meantime
they could co-exist.   Aside  from  the  technical  problems
involved,  there  is  the  question of when the switch would
occur, and if the idea of newsgroups would ever get  out  of
the system.

     The  compromise  solution  is  to  plan  for  permanent
cooperation by incorporating the newsgroup idea into K news.
A newsgroup becomes a special, high overhead keyword.  In  K
news,  it  is used as a directory name for storing articles,
and as the interface to B news.  In this system,  we  demand
that  K news users provide newsgroups as well as keywords on
their articles.  Although this has some problems in  educat-
ing  the  users,  I  think it is no worse than sticking with
newsgroups.

     If newsgroups exist, and B news sites exist, a  mechan-
ism  is  required  that  maps newsgroups to appropriate key-
words.   One  simple  mechanism  is  just   to   include   a
Newsgroup=xxx  keyword for each newsgroup an article belongs
in.  K news users can select that keyword in their subscrip-
tion file.  Slightly more sophisticated would be to create a
mapping table at B to K interface sites so that articles  in
a  group  like  "net.columbia"  get  keywords  of  the  form
"Newsgroup=net.columbia" and "space shuttle".

_2._6.  _S_h_i_p_p_i_n_g _t_o _o_t_h_e_r _s_i_t_e_s

     With  new  modifications  to  uux   possible,   It   is
envisioned  that each site receiving news from a K news site
would essentially have a .newsrc like file on the forwarding
site.   This  is  to say that each site would be in the same
position as a user, with a keyword subscription list  and  a


Brad Templeton                                             8







URFC 002                                              K NEWS


list of unread articles.  Forwarding could either be done by
using the same process a user  does  to  read  news  when  a
transfer  is made, or by having the K inews check each arti-
cle in the subscription files for known  sites.   The  first
way,  of course, is much more efficient.  With batching, the
first stage readnews process could be  run  to  collect  the
chosen files in a batch.

_2._6._1.  _D_i_s_t_r_i_b_u_t_i_o_n

     In order to keep a  site's  subscription  file  simple,
distribution  keywords  (required  on  all articles) will be
matched by  "distribution=xxx",  where  xxx  is  stuff  like
"local", "canada", "usa", and the dreaded "worldwide" (equal
to "net").  The default  distribution  for  posted  articles
will  be  set  locally, but it should be encouraged to be as
small as reasonable, such as the local state or province.

     One problem with this sort of distribution scheme  (and
the  current  B system) is that sometimes a user really does
want an article distributed netwide in the "auto"  newsgroup
but  only locally in the "general" newsgroup.  Consideration
must thus be given to explicit distribution bindings on key-
words.  My suggestion is to have the "distribution" keywords
(as we think of them now)  apply  to  all  keywords,  except
those with an explicit distribution.  Thus a file with:

    Subject: Toronto Space museum opens
    Distrubtion: local
    Keywords: events, space/north america

Such an article would go to "events"  readers  locally,  and
"space" readers both locally and all over the continent.

_3.  _S_u_b_s_c_r_i_p_t_i_o_n _L_i_s_t

     One of the most important facets of the K  news  imple-
mentation  I propose is the use of a sophisticated subscrip-
tion list.  This list would be used by both users and  sites
to  decide  what  articles  are to be seen during a session.
Fundamental to this scheme is the ability to define  keyword
patterns,  so that selections can be done on not just single
keywords (as B news works) but on arbitrary combinations.

     The first reading program will maintain two files.  The
first  of  these is the subscription list.  This tells which
keywords and discussions the user is  interested  in.   This
will be a list of keywords subscribed to and boolean expres-
sions built from them.  Keywords are actually text  strings,
but  they  may not contain a special set of characters which
are used to delimit them.  These  characters  are  "="  ":",
",",  "!",  "[",  "]",  "&",  "|", "*", "/", "(", and ")" to
start with.  Some, like "=", are used  within  meta-keywords


Brad Templeton                                             9







URFC 002                                              K NEWS


to  match  special  conditions  known  to  the software like
sites, article-ids and the  like.   No  doubt  more  special
characters  should  be  reserved  for future use, while some
should be allowed within keywords.  Each line in a subscrip-
tion  file  consists  of  a  keyword pattern to describe the
user's interests.  In addition, some special  lines  in  the
subscription  file  will  tell what the user wants done with
articles from the previous  session,  and  possibly  special
options.

     A typical subscription line lists  a  keyword  pattern.
For example, the line:

    science fiction

Asks for all articles with the  keyword  "science  fiction".
Quotes  may be required, but this is a matter to be decided.
It also makes sense that any blank fields in  a  keyword  be
compressed to one space so that typos do not cause problems.
The line "!star wars defence" would  ask  that  no  articles
with  the keyword "star wars defence" be shown.  We can also
ask for "Ronald Reagan & taxation" to ask for  all  articles
with both of the keywords show.   Similarly "Ronald Reagan &
!taxation" shows us all articles about old Ron that  do  not
contain the taxation keyword.  Or we could go for

    Ronald Reagan & ( taxation | star wars defence )

Which shows us articles about Ron that have  nothing  to  do
with taxation or the star wars defence scheme.

     The order in the file is  important.   When  phase  one
tries  to  figure  out if a user wants to see an article, it
scans through the information in the subscription  list,  in
order.   It  stops as soon as it finds some form of definite
information.  This  means  either  positive  information  or
negative  information.   If the first line in your subscrip-
tion file is "Ronald Reagan", you will see  all  such  arti-
cles,  even  if  they  contain other keywords that you hate.
Likewise, if the first line in the file is "!Ronald Reagan",
you will never see an article about him, even if it contains
a keyword you subscribe to later on.  (There is an alternate
system described below to change this.)

     The character "*" will match any keyword.  It would  be
placed  on  the last line of a subscription file to indicate
that any keyword not marked with an "!"  is  subscribed  to.
It  is  doubtful  anybody would use this after the number of
keywords grows.

     Keywords may have "sort attributes" on them to indicate
which  keywords  you  would  like to see first in a session.
These are essentially ascii strings which will be passed  to


Brad Templeton                                            10







URFC 002                                              K NEWS


sort(1).   If  you  want to see articles about "system shut-
down" first, you give it a low value like "A".   If you want
to  see articles about "big mac" last you give a priority of
the form "zzzzzzzz".  The nice thing about this is that when
you  have  a  new keyword, you can easily give it a priority
between any two that exist, unless you have given  something
a  priority  like  "^@", in which case it would be first for
all time.  We now see lines like:

    system shutdown [AAA]
    space [bb] & challenger [cc]


_3._1.  _S_a_m_p_l_e _f_i_l_e

     Here are some sample subscription lines that you  might
have.   The  comments  actually  would  not  be in the file,
although that could be a possible feature.

    OPTIONS: +newkeywords +oldnews  ; show me new keywords that have come in,
                                    ; and mix in my old news from before
    !flame                          ; show me no flame articles
    !query                          ; show me no "does anybody have" articles
    system news
    microcomputer & !trs-80         ;anything on micros that isn't on trs-80s
    unix & !(4bsd | version 7)
    sex & drugs                     ; anything about both
    rock & roll                     ; 8-)
    site=looking & poster=brad      ; anything from me - the default ;-)
    movies & distribution=ontario   ; movie articles from my own province only
    distribution=local              ; anything posted on my own machine
    art=123@looking                 ; that article and any followups
    !art=124@looking                ; none of that article or any followups
    !(!source code & size>7K)       ; a possible feature, no file bigger than 7k
                                    ; that isn't a source file


_4.  _T_y_p_i_c_a_l _S_e_s_s_i_o_n

     The typical user interface program will first check  to
see  what  new keywords have come in since the last session.
These will be recorded in a separate history file  in  which
the last position read must be recorded.  The user, if it is
requested by appropriate options, will then be given a  list
of  new keywords that have appeared since the last time news
was read.  Some systems will query the user and allow him or
her to place these new keywords in the subscription files.

     The user interface must now call the phase one program.
with  appropriate  options, and the name of a temporary file
to put the sort output in.  It may  also  request  the  sort
output  on  a  pipe if that is all it needs.  (Most programs
will want to be able to  seek  back  in  the  output  file.)


Brad Templeton                                            11







URFC 002                                              K NEWS


Articles  will then be shown in the order requested, grouped
perhaps according to followup discussions or major keywords.
At  the  end, a list of unread articles will be written out.
Articles will probably be grouped by discussions and  higher
priority  keywords.   Followups  will  insist on a change of
subject and allow an addition of keywords and  a  change  of
the distribution.

_5.  _A_l_t_e_r_n_a_t_e _S_u_b_s_c_r_i_p_t_i_o_n _I_d_e_a

     It is possible users will require more control on which
subscription  lines get priority than the order in the file.
Thus it is proposed that keywords get points  based  on  how
much  a  user  wants to see a keyword.  Keywords you want to
see would get positive points and keywords you don't want to
see  would get negative points.  For example: "Ronald Reagan
: 5" would assign 5 points to any  article  containing  that
keyword.   On  the  other  hand "star wars defence : -4" and
"taxation : -6" would assign negative points to  those  key-
words.  In this case, you would see articles with Reagan and
star wars defence, but would not see  articles  with  Reagan
and taxation.  Scores would apply to whole lines.  For exam-
ple:

    (Ronald Reagan [abc] & taxation [cde]) : 20

Would give 20 points to any article with both keywords.

     In this system, any article must scan the  whole  list.
For  every match we get, we add the points assigned for that
match to our sum.  If, at the end, the  sum  is  >=  0,  the
users  sees  the  article.  If negative, it is not seen.  It
should also be possible to assign scores of "oo"  and  "-oo"
which  would  represent  infinite  scores  and stop the scan
right away.

     In any system, by the way, the whole subscription  file
must be read into RAM.  Since the phase one program has lit-
tle to do but read this file, however,  the  K  news  system
should  be  able  to handle large subscription files.  Since
followup message-ids will also be placed  in  this  file,  a
utility that deletes very old ones would be a good idea.

_5._1.  _M_o_r_e _R_a_n_d_o_m _I_d_e_a_s

     We can add subscription features as we like.   It  will
have  to  be worked out what users want.  Some ideas include
the scheme above, plus:

(1)  The ability to match a keyword only if it is  alone  on
     the  line.  For example, you might want to see articles
     about "microcomputers" but not if they  are  associated
     with other topics.  Same with "abortion".


Brad Templeton                                            12







URFC 002                                              K NEWS


(2)  Real pattern matching on keywords,  regular  expression
     style.   This might be too slow, for if you don't allow
     it, it lets the keyword programs map the keywords  seen
     to  integers  for easy matching.  But it might be worth
     it.

(3)  Pattern matching on the  subject.   This  is  something
     various  news  secretaries  do.  In theory, this should
     not be necessary as any important word you might search
     for would probably be a keyword.

(4)  Pattern matching on the body.  This could  be  done  by
     means of those special hash formulae (such as csh uses)
     that tell if a given string is NOT within  an  article,
     with some reliability.

(5)  Timestamps on patterns added by programs  to  the  sub-
     scription files.  When you decide to shut off a discus-
     sion, the software will add a  "!123@looking"  to  your
     file.  You don't want these to build up, so it might be
     good to have timestamps on them so  that  they  can  be
     removed later on once a discussion is dead.

(6)  Piles more in  the  way  of  special  keywords  in  the
     required  group,  so people can be more specific.  Dif-
     ferent types of classified ads.

(7)  Facilities for moderators.  Ability to pattern match on
     the moderator of choice.

_6.  _C_r_i_t_i_c_i_s_m _a_n_d _A_n_s_w_e_r_s

     Of course no system is perfect and  some  have  pointed
out  a few problems that may arise with K news.  For most of
these, I feel that the problem  is  even  worse  with  news-
groups, or at least little better.

     The main point is that some people feel that there  are
too many newsgroups now as it is.  This is to say that there
are too many to remember them all.  Some feel that with  the
proliferation  of  keywords, users will be less certain what
keyword to use, and post to the wrong  keyword  more  often.
Thus  some  important  information  that you might have seen
could be lost.

     It's my opinion that far more important information  is
lost today because of the noise that results from newsgroups
being to general in scope.  I, and many others, have  unsub-
scribed to groups we are interested in because we can't han-
dle all the garbage in the group to sort out  the  gems.   I
also use the "n" key a great deal - on over 70% of the arti-
cles in groups I do read.  If the subject is too  short,  or
"Orphaned  response"  or that sort of think, I say "n" right


Brad Templeton                                            13







URFC 002                                              K NEWS


away.

     To keep this down, the answer is more software.  As the
need arises, we might see fancy programs to help people find
the right keywords.  Whenever somebody creates a keyword, it
will  be  their  duty  to  make  a  short description of it,
including  related  words.   Thus  the  creator  of  "ronald
reagan" would add a line saying:

    president, arms race, abortion, economy, usa, government, politics

and an appropriate utility could take words  from  the  user
(perhaps  even  text  of an article) and "grep" for words in
the keyword list.  This would be an special  utility  called
by  the  news  posting  utility,  so it could be written and
maintained at yet another location.   This tool  could  also
use  standard spelling correction algorithms to suggest key-
words.  Naturally, news administrators  could  update  these
keyword  descriptions  if  the creator of the keyword didn't
come up with a good one.  A control message could even  keep
the file up to date.

_7.  _C_o_m_m_e_n_t_s

     This is just a  draft  proposal,  and  lots  of  little
details are missing.  comments are welcome.  Also welcome is
somebody to implement the thing since many  people  are  too
busy  to  do so.  The implementation could be done in spots,
and much of the code can be taken from the existing  B  news
since the same header formats etc.  would be used.  I can be
reached  at  watmath!looking!brad  or   watmath!bstempleton.
Watmath is called by ihnp4, decvax, utzoo, allegra, utcsrgv,
hcr and many others.



















9

9Brad Templeton                                            14



-- 
	Brad Templeton - Waterloo, Ont. (519) 886-7304