[comp.sources.misc] v09i083: newsclip 1.1, part 14 of 15

brad@looking.ON.CA (Brad Templeton) (12/20/89)

Posting-number: Volume 9, Issue 83
Submitted-by: brad@looking.ON.CA (Brad Templeton)
Archive-name: newsclip/part14

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of archive 14 (of 15)."
# Contents:  doc/man.mm.1
# Wrapped by allbery@uunet on Tue Dec 19 20:10:09 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'doc/man.mm.1' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'doc/man.mm.1'\"
else
echo shar: Extracting \"'doc/man.mm.1'\" \(51014 characters\)
sed "s/^X//" >'doc/man.mm.1' <<'END_OF_FILE'
X.nr Ej 1
X.nr Hb 4
X.nr Hs 3
X.ds HP +4 +2 +1 +0 +0 +0 +0
X.de xd
X.sp 1V
X.br
X\\$1 \s+2\fB\\$2\fP\s-2
X.sp 0.7V
X..
X.de Bb
X.sp 0.5V
X.in 0.4in
X.nf
X..
X.de Be
X.sp 0.5V
X.in -0.4in
X.fi
X..
X.de St
X.sp 0.8V
X.in 0.4in
X\\$1
X.in -0.4in
X.sp 0.8V
X..
X
X
X
X.H 1 "Introduction"
X.P
XAt the end of the 1970s, when the Unix\*(Tm operating system was maturing
Xand gaining popularity throughout the academic world, some graduate
Xstudents in North Carolina had a very interesting idea.  They hooked
Xtogether multiple Unix computers to form multi-machine \fIbulletin boards\fP --
Xcomputerized conferences.
X.P
XFor some time, software had existed which allowed users of
Xa single machine to access and add to a computerized discussion.  In addition,
Xmulti-machine electronic mail also existed, and some users were already
Xparticipating in multi-machine mailing lists which allowed computerized
Xdiscussion to take place over networks.
X.P
XWhat was new in North Carolina was simple yet powerful.  A system was
Xcreated that allowed the discussions to be distributed, and it also
Xmade the sharing of the discussions moderately efficient.  Finally, it
Xwas set up so that anybody who bothered could connect up and participate.
X.P
XThe result was the exponential explosion known as USENET.   USENET is
Xreally just a great number of computers who share a common file format
Xfor exchanging broadcast electronic messages.   Anybody can write a message
Xon their computer.   Their computer then sends the message to all the
Xcomputers it talks to, and they send it to their friends, and those friends
Xsend it on to theirs, and so on, and so on...
X.P
XThis resulted in the world's largest computerized discussion network.
XLarge as it is, however, this network has no actual existence as an
Xentity.  Its existence is simply the sum of all the computers who are
Xwilling to forward information around to their neighbours.
X.P
XThere are benefits to anarchy, but there are also drawbacks.  On USENET, there
Xare few checks and balances to control postings.  Anybody can
Xpost anything, and quite often they do.  As of this writing, USENET
Xpasses over 3,000 different messages per day, with a volume of over
X7 megabytes each and every day.
X.P
XIn the early days of USENET, participants could, and often did, read just
Xabout everything.   With hundreds of thousands of participants, that is
Xno longer even remotely possible.   USENET messages are broken up into
Xcategories called \fInewsgroups\fP.  All messages are categorized into
Xone or more such groups, and readers only read messages in groups
Xof their choosing.   As time goes by, even the most enthusiastic readers
Xare able to read only a smaller and smaller percentage of the total
Xarray of groups.  There are currently over 600 mainstream groups, and the
Xnumber is growing.  In addition, there are at least 500 alternate groups,
Xincluding the News Wire and electronic publication feeds of ClariNet.
X.P
XIn most cases, even the group selection isn't enough.  Many groups handle
X50 or more messages per day.  Readership estimates suggest that over 1
Xbillion times a month, somebody reads or scans a USENET message.  At
X2 seconds/message, that amounts to over 700 man hours per day of human
Xtime (at least) and a few million dollars per year.
X.P
XAnd that doesn't include the transmission and storage costs at each of
Xthe estimated 15,000 sites.
X.P
XThere's just too much to read, too much to send, and people easily
Xget swamped.
X
X.H 2 "RN"
X.P
XPart of this problem is solved by the \fIkill file\fP feature in one of
Xthe more popular news-reading programs.  This program is known as
X``RN.''  It is the work of Larry Wall and is given away freely.  RN
Xusers can specify \fIkill files\fP, which contain patterns (regular
Xexpressions) and associated commands.   With this, users can arrange
Xto not see articles that have certain strings in either the header, subject
Xor entire contents of an article.
X.P
XThis, and other news killing tools have been a great boon on the net,
Xbut no system for filtering news can handle every reader desire.
X.P
XUnless, of course, the system is a programming language.
X.H 2 "NewsClip"
X.P
XThat's what NewsClip is -- a programming language that lets you examine
Xnews articles and assign them a value.  From that value, NewsClip programs
Xcan decide whether you want to read an article or not.
X.P
XWith NewsClip, you can examine articles with simple expressions or
Xarbitrarily complex programs, and control your reading through these
Xprograms.
X.P
XA NewsClip generated program can filter what you read in a number of
Xways.  In one mode, it can run as a standalone program which examines
Xyour \fB.newsrc\fP, evaluates unread articles, and marks undesired articles
Xas read in the \fB.newsrc\fP.  In another, it can talk directly to some
Xnewsreading programs.   How it does this, however, is unimportant
Xright now.
X.P
XThe core of a newsclip program is the \fBARTICLE\fP procedure.  This is
Xthe routine that gets called for every article that is examined.  The
Xpurpose of this routine is to decide whether or not to accept or reject
Xan article.
X.P
XThis decision can be arrived at in one of two ways.  Explicit
X\fBaccept\fP and \fBreject\fP statements can judge an article immediately.
X.P
XAlternately, statements can alter a running \fBscore\fP for the article,
Xmaking adjustments to it until the routine finishes.  In this case, articles
Xthat finish with a score greater than zero are accepted, and those with
Xscores less than or equal to zero get rejected.
X.P
XHere, for example, is a very simple complete NewsClip program that
Xrejects all articles posted to more than one group:
X.Bb
Xprocedure
XArticle() {
X	if( count(newsgroups) > 1 )
X		reject;
X	 else
X		accept;
X}
X.Be
XRight away you may have noticed that the style of the language is similar
Xto that of C.  This has been done so that C programmers will adapt easily.
XIf you're not a C programmer, don't worry,
Xas the language is still simple enough to learn from scratch -- at least
Xfor all the basic ways of examining news articles.
X.P
XYou will also have noticed the pre-declared array \fBnewsgroups\fP and
Xthe special function \fBcount\fP and the special statements \fBaccept\fP
Xand \fBreject\fP.  You'll read more about these later.
X.H 2 "What it Means"
X.P
XYou may already be catching on to what NewsClip means for USENET.
XWith NewsClip you can turn USENET into any sort of network you want.
XSo long as there's enough information in the articles to classify them,
Xyou can read exactly the USENET you want to see, and little more.
X.P
XIf you don't like certain users, you can eliminate them from your reading.
XYou can even eliminate all followups to their postings.  Whole sites
Xcan be eliminated or given priority.
X.P
XInstead of saying what you don't want to see, (as with the \fIkill\fP file)
Xyou can specify what you \fIdo\fP want to see.  Or you can specify what
Xyou want for one newsgroup and what you don't want in another.
X.P
XWith the current USENET subscription mechanism, you see everything that's
Xin a group, even if it's also in other groups.  With NewsClip, you can
Xprogram your reading to include only things that are in certain combinations
Xor groups, or exclude those combinations through any expression of boolean
Xlogic you care to code.
X.P
XYou can give priority to articles posted locally, or those posted by your
Xfavourite writers.  You can see all references to your own articles or
Xonly see articles in netwide groups that were posted strictly for your
Xcountry.  All these things can be combined as you like.
X.P
XIf you want to go back to the USENET of yesteryear, you can.  Leave
Xout the new sites or assign negative scores to newcomers.  It's up to
Xyou.
X.P
XYou can even enforce your own USENET posting rules by rejecting articles
Xwith long signatures, or articles that contain mostly included text.
XOr you can do that only for certain users, or certain groups, as you
Xwish.
X.P
XIn other words, USENET becomes, to you, just what you want it to be.
X.P
XWe're pushing on USENET for better posting programs that put more and
Xmore information about the type of article into the article header.
XAs more and more such information becomes available, you'll be able
Xto program better and better criteria for your reading.
X
X.H 2 "Credits"
X.P
XThe NewsClip system was designed by Brad Templeton, and written by
XBrad Templeton and Tim Tyhurst of Looking Glass Software Ltd.
XThe regular expression routines appear courtesy of Henry Spencer of
Xthe University of Toronto.  The USENET date parsing routine appears
Xcourtesy of Steven Bellovin of AT&T.
X.P
XThe manual was written by Brad Templeton.  Problems and correspondence should
Xbe mailed to ``newsclip@looking.on.ca,'' and \fBnot\fP to the authors'
Xprivate mailboxes.
X
X
X
X.H 1 "Basics"
X.P
XA NewsClip program consists of declarations, procedures and functions.
XThere are several procedures which are special, and act as ``entry points,''
Xso that your program takes control at the right times when articles
Xare scanned.
X.P
XA procedure looks like:
X.Bb
Xprocedure \fIprocname\fP( \fItypelist\fP )
X{
X	\fIlocal declarations\fP
X	\fIstatements\fP
X}
X.Be
XYou will see lots of examples shortly.
X.P
XAll the special procedures are optional, except the one named \fBarticle\fP.
XAlmost all programs will have global declarations, which can be
Xinterspersed amongst the procedures and functions, but are normally found
Xat the beginning of the file.
X.P
XMost programs will only use the \fBarticle\fP procedure.  Some may use
Xthe \fBinit\fP and \fBterminate\fP procedures, which are called when the
Xnews filtering program starts and finishes, and others will use the
X\fBstartgroup\fP and \fBendgroup\fP procedures, which are called before
Xand after a new newsgroup is scanned through.  For full details, see the
Xsection on \fIentry point\fP procedures.
X
X.H 2 "Data Types & Declarations"
X.P
XNewsClip comes with a small number of data types designed for use
Xin examining news articles.  They are the same data types that appear
Xin the headers of news articles, namely integers, dates, newsgroups,
Xuserids, strings and arrays of these types.
X.P
XThere's also a special database type, and special symbols that allow
Xreferences to text regions in the body of the article.
X.P
XDeclarations can appear anywhere outside of procedures or functions, 
Xin which case they are global, or at the
Xstart of the code for any procedure or function, in which case they are 
Xlocal to that procedure/function.
X.P
XTo declare a variable, simply give a declaration statement with the
Xname of the type and the name of the variable, followed by a semicolon.
XFor example:
X.Bb
X	int likeit;
X	string thename;
X.Be
X
X.H 2 "First Examples"
X.P
XLet's begin by looking at some simple programs.  As we described in the
Xintroduction, the purpose of a NewsClip program is to accept or reject
Xarticles using the \fBarticle\fP procedure.
X.Bb
Xprocedure
Xarticle() {
X	if( is talk.flame )
X		reject;
X	if( is rec.humor && is talk.bizarre )
X		accept;
X	 else
X		reject;
X}
X.Be
X.P
XThis program introduces us to some of the most important statements in
Xthe NewsClip language, namely \fBaccept\fP and \fBreject\fP.
X.P
XThey cause the article to be accepted or rejected out of hand.  As soon
Xas they are encountered, processing of the article stops.  Thus while an
Xarticle might satisfy one \fBaccept\fP criterion and another \fBreject\fP
Xcriterion, what actually happens depends on which criterion comes first
Xin the program.
X.P
XThus while our example wants to accept articles crossposted to
X``rec.humor'' and ``talk.bizarre'', it doesn't even get to examine those
Xalso posted to ``talk.flame,'' which are rejected by the first statement.
X.P
XAlso introduced here is the \fBif\fP statement.  C programmers will note
Xthat the syntax for it is the same as the one for C, namely:
X.Bb
Xif( \fIcondition\fP )
X	\fIstatement\fP
X.Be
X.P
XIn the \fBif\fP statement, the condition is evaluated.  If it is true
X(which is to say, not equal to zero),
Xthen the following statement is done.  If the condition is not true, the
Xstatement is not done.
X.P
XThis program makes use of a special primitive called \fBis\fP.  The
X\fBis\fP \fIoperator\fP is always followed by a newsgroup name.  It is
Xtrue if the current article was posted to the named newsgroup, and false if it
Xwas not.
X.P
XConditional accept and reject are so common that NewsClip contains
Xsome shorthand forms for them, namely \fBreject if\fP and \fBaccept if\fP.
XThe above program could have been written:
X.Bb
X	reject if is talk.flame;
X	accept if is rec.humor && is talk.bizarre;
X	reject;
X.Be
XYou will have noticed by now that all NewsClip statements end in
Xa semi-colon, as in C.   You can enter your NewsClip statements
Xin any form you like, on single lines or multiple lines, as long as you
Xend them with a semi-colon.
X.P
XThe \fB&&\fP operator performs a logical ``and'' on two conditions.
X\fBa && b\fP is true if, and only if both \fBa\fP is true and \fBb\fP
Xis true.
X.P
XWith these tools, you can accept and reject articles based on what newsgroups
Xthey are in.  This is like the regular ``subscription'' mechanism of
Xthe original newsreaders, although it is more powerful because it allows
Xthe use of logical expressions.
X
X.H 2 "Compiling"
X.P
X\fI(We assume that your system has already been installed.  System admins
Xand binary licencees should check out the installation appendix for details
Xon how to install.)\fP
X.P
XNow that you know how to put together a simple NewsClip program, you
Xwill want to be able to compile them.  This is done with the command
X\fBncc\fP, for ``NewsClip Compiler.''
X.P
XNormally, you will place your programs in files with the extension
X``\fB.nc\fP,'' just as C programmers use the ``\fB.c\fP'' extension.
X.P
XSay you have a program in the file \fBmyclip.nc\fP.  To compile, simply
Xissue the command:
X.Bb
Xncc myclip.nc
X.Be
XThis will produce an executable program called \fBnclip\fP in the current
Xdirectory.
X
X.H 3 "Executing"
X.P
XThere are many ways that the generated news clipping programs can be used.
XThe way you will probably use first is \fInewsrc\fP mode.  This is meant to
Xwork with news reading programs that keep track of read articles in
Xa file called \fB.newsrc\fP.
X.P
XSimply say:
X.Bb
Xnclip mode=newsrc
X.Be
Xor simply:
X.Bb
Xnclip m=n
X.Be
Xand your \fBnclip\fP program will examine the \fB.newsrc\fP file to
Xlook for unread, unprocessed articles.  It will then examine all these
Xarticles with your filtering program.  Rejected articles will be marked
Xas read, and accepted articles will be marked unread.  When it is done,
Xthe \fB.newsrc\fP file will be updated.
X.P
XWhen you next run your newsreading program, you will not see the rejected
Xarticles.
X.P
XYour \fBnclip\fP program has a number of other modes of operation.  One
Xtakes a list of article filenames and filters out the rejected ones.
X.P
XIn another mode, your program can talk to specially adapted newsreader
Xprograms though pipes, allowing the filtering to take place as you
Xread news.  This makes your filter program seem part of your newsreader.
X.P
XThese other modes are detailed in a special chapter.  For now, just run
Xyour \fBnclip\fP program as described above before your newsreading
Xsession.  You should perhaps make a shell script or shell alias to do
Xthis for you:
X.Bb
Xnclip mode=newsrc
Xrn
X.Be
X.H 3 "List Mode"
X.P
XWhen you are starting off, you may not wish to run your test NewsClip
Xprograms on your real \fB.newsrc\fP file.  One way to protect it is
Xto copy it elsewhere before running tests.   Another idea is to run
Xyour clipping program in \fIlist\fP mode with:
X.Bb
Xnclip mode=list
X.Be
X.P
XIn this mode, the \fB.newsrc\fP file is examined as always, but is not
Xupdated.
XInstead, a list of the filenames of accepted articles is written to the
Xstandard output.  You can look at these files yourself to see if your
Xprogram is doing what you want.
X.H 2 "More Examples"
X
X.H 3 "Externals"
X.P
XTo do much more than this, you will have to learn how to declare and,
Xin particular, to import variables.   Most of the information about a
Xnews article is contained in the header.  With each article, NewsClip
Xexamines the header,
Xstoring the various fields you are interested in
Xinto variables which you can use in your expressions.
X.P
XTo use these variables, you must ``import'' them with the \fBextern\fP
Xdeclaration.
X.Bb
Xextern userid From;		/* the From: line */
Xprocedure
Xarticle() {
X	if( From == "brad@looking.on.ca" )
X		accept;
X	 else
X		reject;
X}
X.Be
XHere we see both the use of an imported \fBuserid\fP variable, and
Xthe concept of string comparison.  \fBUserid\fP variables, like \fBFrom\fP,
Xact like \fBstring\fP variables in just about every way.   In this case,
Xa \fBuserid\fP variable is compared with the constant string that is the author's mail address.
X.P
XThis program would accept any article posted by myself, and reject all others.
XWise choice!
X
X.H 3 "If-else"
X.P
XIn this example, you also see that \fBif\fP has an \fBif-else\fP form
Xsuch as C programmers might expect.  If the condition is true, the statement
Xafter the
X\fBif\fP is executed.  Otherwise, the statement after the \fBelse\fP
Xis executed.
X.H 3 "Comments"
X.P
XYou may also have noticed the comment on the declaration.  Comments start
Xwith \fB/*\fP and end with \fB*/\fP.  They may be placed anywhere
Xwithin a program.  They are ignored by the compiler.  Note that, as in
XC, you may not nest comments.  The very first \fB*/\fP within a
Xcomment terminates the comment.
X.H 3 "More Externals"
X.P
XThe \fBFrom\fP variable is just one of many special ``header'' variables
Xwhich you can import with the \fBextern\fP declaration.   There are
Xspecial header variables for each common USENET news header line.  Each
Xhas the same name as the header name, except dashes are replaced with
Xunderscores where necessary.
X.P
XYou must declare these header variables with the appropriate type.  For
Xexample, \fBFrom\fP is always a \fBuserid\fP, while \fBmessage\_id\fP
Xis always a \fBstring\fP.  The other major header fields that are commonly
Xused are declared by:
X.Bb
Xextern string subject;		/* The subject line */
Xextern string message\_id;	 /* Unique message identifier */
Xextern int lines;		/* number of lines in article */
Xextern string array references; /* message-ids of parent articles */
Xextern newsgroup array newsgroups;	/* the newsgroups line */
X.Be
XImportation of the last one, \fBnewsgroups\fP, is optional.  It's
Xalways available whether you declare it or not.
X.P
XHeader variables may only be imported at the global level, because importing
Xthem has global implications.   Other variables may be imported both
Xat the global level, and within procedures and functions.
X.H 3 "Array"
X.P
XTwo of the above header variables are arrays.  In NewsClip, you can
Xhave arrays of any of the 5 basic types.  They are declared by using
X``\fItypename\fP \fBarray\fP'' in place of a simple \fItypename\fP,
Xas shown above, for example, in \fBstring array references\fP.
X.P
XArrays can be indexed, their size can be taken, and you can test for
Xthe presence of other values (such as single scalar variables) in an
Xarray.  You will find them very useful in certain applications.
X.P
XFor now, the only arrays you can use are pre-defined header array
Xvariables.  Later, you will read how to create your own arrays.  If
Xyou declare an array and use it without giving in a dimension, your
Xprogram may bomb.
X
X.H 3 "Variables & Assignment"
X.P
XYou can, of course, declare your own variables, either globally,
Xor locally in each routine, including
Xthe \fBarticle\fP routine.   Just like in any programming language,
Xthese variables can be used and assigned to.
X.P
XThe simple assignment statement looks like:
X.Bb
X\fIvariable\fP = \fIexpression\fP;
X.Be
X.P
XTwo other assignment operators are the increment and decrement operators,
Xwhich can only be used on integer variables.  \fBvar++;\fP is the same
Xas \fBvar = var + 1;\fP, while \fBvar-\-\fP is the same as
X\fBvar = var - 1;\fP.
X.P
XUnlike C, these assignment forms can only be used in independent
Xstatements.  They \fBcan't\fP be used in the middle of an expression.
XFor example, this means that the common C habit of using an assignment as
Xthe condition of an \fBif\fP statement (ie. \fBif( var = a + b )\fP)
Xis not allowed.
X
X.H 3 "Identifiers"
X.P
XYou may have noticed that in NewsClip, just about anything, except
Xfor constant strings, can be in any case.  All upper case in identifiers
Xand keywords is mapped to lower case by the compiler.  This is not like
XC, where the very similar looking \fBHello\fP and \fBhello\fP can be two
Xdifferent variables.
X.P
XOtherwise, identifiers must start with a letter, and may consist of letters,
Xdigits and the underscore character.  Names are significant to one less
Xcharacter than your C compiler allows.  Most modern C compilers have
Xno limit.   Some older ones have a limit of 7.  If yours does, get a new
XC compiler.
X
X.H 2 "Data Types"
X.P
XBefore going further, a more formal discussion of the NewsClip data
Xtypes is in order.
X.H 3 "Integer"
X.P
XThe \fBint\fP type and the \fBdatetime\fP type are integer types.  The
Xsize of the \fBint\fP is machine dependent.  The size of the \fBdatetime\fP
Xtype will always be large enough to hold date values, which are the number
Xof seconds since midnight GMT, Jan 1, 1970.
X
X.H 3 "String"
X.P
XThe \fBstring\fP type declares variables that are character strings.  You
Xcan compare them with other strings, pass them as arguments to
Xsubroutines, search for patterns in them or use them as search patterns.
XTwo important things to remember about strings are:
X.AL
X
X.LI
XThey are usually temporary, and only last for the duration of the processing
Xof a single article.
X.LI
XString variables actually just point to strings, they aren't the strings
Xthemselves.  If you assign one string variable to another, they both refer
Xto the very same physical string in memory.
X.LE
X.H 3 "Userid"
X.P
XVariables of type \fBuserid\fP are special forms of strings.  They are only
Xderived from header lines like the \fBFrom:\fP line of an article.  If
Xyou refer to a \fBuserid\fP variable, you will get the string that is
Xthe return mail address section of the \fBFrom:\fP line.   With the
X\fBrealname\fP function, you can extract the user's ``full name.''
X.H 3 "Newsgroup"
X.P
XThe \fBnewsgroup\fP type is a special and important type.  Internally,
X\fBnewsgroup\fP variables are integers.  When NewsClip programs
Xprocess news articles and news files, all relevant newsgroups are
Xassigned unique newsgroup numbers.  These numbers are what get stored
Xin \fBnewsgroup\fP variables.
X.P
XThe unique thing about \fBnewsgroup\fP values is that you can use them
Xin expressions wherever a string is required.  If you do, the newsgroup
Xnumber will be mapped to the string that is the newsgroup name.  Thus
Xyou can compare a string with a newsgroup, assign a newsgroup to a
Xstring variable, search text using a newsgroup as a string, or even search
Xfor patterns in a newsgroup or array of newsgroups.
X.P
XThe only thing you can't do is assign a string value to a newsgroup
Xvariable.
X.P
XNewsgroup constants can be expressed in the source code by prefacing a
Xnewsgroup name with a ``\fB#\fP.''  These constants get replaced with
Xthe appropriate newsgroup number.  For example:
X.Bb
Xnewsgroup mygroup;
Xmygroup = #rec.humor.funny;
X.Be
XWhile there is no official definition of what characters can go in
Xa newsgroup name, NewsClip allows only alphanumerics, the dot, dash,
Xunderbar, plus and minus characters.
X.P
XYou can also assign a newsgroup value to an integer variable to extract
Xthe newsgroup number, although this is not usually useful.
X
X.H 3 "Array"
X.P
XAs noted, you can declare arrays of any of the above types.  Normally
Xyou will not have to define your own arrays.  The only ones you are likely
Xto use are the special array header variables.
X.P
XYou can index into arrays with square brackets, as in C.  For example,
X\fBnewsgroups[0]\fP is the first newsgroup on an article's newsgroups
Xline.  Array indices should be integer expressions, and the first element
Xin an array is index 0.  No check is made on indexing.  If you index beyond
Xthe bounds of an array, your program can crash.
X.P
XBefore using a user-declared array, you must dimension it with a statement
Xlike:
X.Bb
Xmyarr = array 10;
X.Be
XWhere you can provide any integer expression for the size.  Indices go
Xfrom 0 to 9 in the above case.
X.P
XYou can get the size of an array with the \fBcount\fP operator.  As
Xshown in the very first sample program in the introduction,
X\fBcount(newsgroups)\fP gives the number of newsgroups an article was
Xposted to.
X.P
XThere is a special variant of the \fBfor\fP loop (described later) that
Xworks with arrays, and the new \fBin\fP and \fBhas\fP operators are
Xdesigned to work with arrays.  More on these later.
X.H 3 "Database"
X.P
XThe \fBdatabase\fP type is not used in any header items, but you will
Xfind it very useful in keeping track of collections of items, and for
Xremembering information from past articles.  The \fBdatabase\fP is described
Xin a special section of its own.
X
X.H 2 "For Loop"
X.P
XNewsClip comes with two kinds of for loops.  The first is just like the
Xone from C.  The syntax is:
X.Bb
Xfor( \fIassignment\fP ; \fIcondition-expr\fP ; \fIassignment\fP )
X	\fIstatement\fP;
X.Be
XThe first assignment statement gets executed once, at the start of the loop.
X.P
XThen, for each execution of the loop, the \fIcondition\fP is evaluated.  If it
Xis true, the loop \fIstatement\fP is executed.
XIf it is false, the loop terminates.
XNote that if the condition is false at the very start, the loop code is
Xnever executed.
X.P
XEach time the \fIstatement\fP is executed, the second \fIassignment\fP is
Xexecuted.  After that the \fIcondition\fP is evaluated again to see if
Xthe loop should continue.
X.P
XA typical use of this is a counting loop:
X.Bb
Xfor( counter = 0; counter < 10; counter++ )
X	myarray[counter] = counter * counter;
X.Be
X.P
XThe second form of the \fBfor\fP loop may be more useful in dealing with
Xarrays and databases.  The syntax is:
X.Bb
Xfor( \fIvariable\fP in \fIarray/database\fP )
X	\fIstatement\fP;
X.Be
XIn this case, the statement is executed for every value in the array or
Xdatabase.
X(See the database chapter for more details on the latter.)   With each
Xexecution, the variable is assigned the value of successive entries in
Xthe array.
X.P
XThis means, of course, that the type of the variable must be the same as the
Xconstituent type of the array.
X.P
XFor example:
X.Bb
Xfor( n in newsgroups )
X	reject if n == #talk.flame;
X.Be
Xis the same as:
X.Bb
Xfor( i = 0; i < count(newsgroups); i++ )
X	reject if newsgroups[i] == #talk.flame;
X.Be
XIt's just simpler and more readable.
X
X.H 3 "While Loop"
X.P
XThere's also a \fBwhile\fP loop, intended for more advanced programming.
XIts form is identical to that of the C language \fBwhile\fP loop, and it is
Xdescribed in the \fBstatements\fP section of the manual.
X
X.H 2 "Compound Statements"
X.P
XWhere we have described the \fBfor\fP and \fBif\fP statements above,
Xonly a single statement has been shown as affected by the condition or
Xthe loop.
X.P
XQuite often you will want your loops and conditions to perform more
Xthan just single statements.  To do this, you
Xuse the ``compound'' statement, which combines multiple statements into
Xa single unit.
X.P
XTo do this, put curly braces around the group of statements, like so:
X.Bb
X{
X	n = #talk.bizarre;
X	accept if subject == "the rain in spain" && newsgroups[0] == n;
X}
X.Be
XUsually you will do this with something like an \fBif\fP, as in:
X.Bb
Xif( is news.admin ) {
X	accept if from == "brad@looking.on.ca";
X	reject if from == "ihate@bad.site.com";
X	reject if lines >100;
X	}
X.Be
XWhere you place your braces, and how you indent, is up to you.  I prefer
Xthe above style, but the choice of style is really a matter of ``religion.''
X.H 2 "Summary"
X.P
XWith these tools, and the \fBswitch\fP statement described in the
Xnext chapter, you will be ready to construct a typical NewsClip
Xprogram to control your newsreading.
X
X
X
X.H 1 "A Typical Program"
X.P
XIn the previous chapters, you learned the basics of how
Xa NewsClip program operates.   You are now ready to write a
Xmore sophisticated clipping program.
X.P
XAs discussed before, the purpose of your program is to accept or
Xreject articles.  You should decide whether you are going to do this
Xby means of explicit \fBaccept\fP and \fBreject\fP statements, or
Xwhether you would like to use a series of statements that calculate
Xa ``value'' or \fBscore\fP for each article.
X.P
XIf you are going to calculate a \fBscore\fP, you will use the \fBadjust\fP
Xstatement.  The \fBadjust\fP statement adds the expression provided to
Xthe article's score.  For example:
X.Bb
Xif( is talk.bizarre )
X	adjust -10;
X.Be
Xsubtracts 10 points from any article posted to talk.bizarre.
X.Bb
Xif( from == "fbaggins@shire.midearth" )
X	adjust 30;
X.Be
Xadds 30 points to any article posted by Frodo Baggins.
X.P
XYour \fBarticle\fP procedure might consist of a series of conditional
Xadjustments to the score of the article.  The score starts with a value
Xof one (1), which means that the default is to accept.  If, at the end,
Xthe score is still 1 or higher, the article is accepted.  Otherwise
Xit is rejected.
X.P
XIf you are using the \fBscore\fP method, you can still use explicit
X\fBaccept\fP or \fBreject\fP statements at any time.
X
X.H 2 "Sections & Switch"
X.P
XA typical clipping program's \fBarticle\fP procedure will consists of
Xboth global conditions and special conditions that only apply in certain
Xnewsgroups.  While you could do the latter with a series of statements
Xlike this:
X.Bb
Xif( group == #news.admin ) {
X	/* statements for news.admin */
X	}
Xelse if( group == #news.groups ) {
X	/* statements for news.groups */
X	}
Xelse ...
X.Be
Xthis is bulky and inefficient.
X.P
XThe \fBswitch\fP statement lets you make multi-way decisions based on
Xthe value of a single expression.  With \fBswitch\fP, you can test whether
Xthe value of an expression matches a variety of constant values, and execute
Xa different piece of code for each constant value.
X.P
XA typical switch looks like:
X.Bb
Xswitch( \fIexpression\fP ) {
X	case \fIconst1\fP:
X		\fIstat1\fP;
X		break;
X	case \fIconst2\fP:
X		\fIstat2\fP;
X		\fImore stats\fP;
X		break;
X	default:
X		\fIdefault-stats\fP;
X		break;
X	}
X.Be
XIt is essentially a large compound statement with ``case labels'' inside.
XThe program jumps to the right case label for the value of the \fBswitch\fP
Xexpression, and executes on down from there until it hits a transfer
Xstatement like \fBbreak\fP, \fBaccept\fP, \fBreject\fP, etc.
X.P
XYou will see a switch, in this case on the newsgroup, in our sample program
Xbelow.
X.H 2 "The Example"
X.P
XA typical program will contain a declarations section where you declare
Xheader variables and other externals, as well as your own global variables.
X.P
XThis will be followed by an article section.  At the top of the article
Xsection, you can declare local variables, and perform any per-article
Xinitialization of variables that is required.
X.P
XThis should be followed by the tests you want to make on all articles,
Xregardless of newsgroup.
X.P
XAfter this, you will want to place a for loop that scans each newsgroup
Xin the \fBnewsgroups\fP array.  For each newsgroup, you will want to
Xperform conditional tests based on the newsgroup and what you like to
Xsee or not see within it.  You do this with a large \fBswitch\fP statement
Xand individual \fBcase\fP statements for each newsgroup constant.
X.P
XAfter the \fBfor\fP and \fBswitch\fP loop, you can include any further
Xtests which you wish to apply to all articles that have not already been
Xaccepted or rejected.
X.P
XHere is a sample program.  In this program, you'll also see the new \fBhas\fP
Xoperator, which does regular expression pattern matching.
X
X.Bb
Xextern string subject;
Xextern userid from;
Xextern newsgroup array newsgroups;
Xextern int followup;			/* is it a followup? */
Xprocedure
Xarticle() {
X	newsgroup n;
X	/* every article code */
X	/* I like articles from this guy */
X	if( from == "goodguy@nice.site.com" )
X		adjust 10;
X	/* but I prefer to avoid followups */
X	if( followup )
X		adjust -6;
X	/* if the article is in alt.flame, forget it.  This could
X	   also be in the case section */
X	if( is alt.flame )
X		adjust -8
X	/* now the newsgroup specific stuff */
X	for( n in newsgroups ) switch( n ) {
X		case #news.admin:
X			if( subject has "voting" )
X				adjust 20;
X			else if( from has "badsite.edu$" )
X				reject;
X			break;
X		case #rec.humor;
X			/* adjust the score of messages that are crossposted
X				to groups you don't like */
X			if( is talk.bizarre || is alt.flame )
X				adjust -10;
X			break;
X		case #sci.physics:
X			/* I only want to see messages that are crossposted
X			   to both sci.physics AND sci.astro, not just one
X			   of them */
X			reject if !is sci.astro;
X			accept;
X			/* no break; needed after an unconditional accept */
X		case #comp.risks:
X		case #rec.arts.sf-lovers:
X			/* my favourite groups */
X			adjust 20;
X			break;
X		case #alt.sex:
X			reject;	/* don't show me anything here */
X		default:
X			/* in the other groups, if the article is heavily
X			   crossposted, drop a point for each group it is
X			   crossposted to. */
X			if( count(newsgroups) > 3 )
X				adjust -count(newsgroups);
X			break;
X		}
X}
X.Be
X.P
XNote that because of the \fBfor\fP loop on the \fBnewsgroups\fP array,
Xthe \fBswitch\fP gets executed for each group.  That means that
Xif the article is crossposted to 5 groups, 5 different cases will get
Xexecuted.   Of course, if any of the cases does an \fBaccept\fP or
X\fBreject\fP, the processing stops right there.
X.P
XAlternately, you may wish to not do a \fBfor\fP loop and just
X\fBswitch\fP on the variable \fBmain\_newsgroup\fP -- the current
Xnewsgroup in a \fB.newsrc\fP scan or newsreader session.  You might
Xalso switch on \fBnewsgroups[0]\fP, the primary newsgroup in the
X\fBnewsgroups\fP list.
X.P
XSome sample template programs are included with the NewsClip system
Xto get you going.  They can be found in a special directory that
Xwas created by the person on your machine who installed the program.
XThat directory might be \fB/usr/lib/news/newsclip\fP or it could be
Xsomewhere else.   Check the ``man'' page for the ``nclip'' program
Xby issuing the shell command \fBman nclip\fP to find out what
Xdirectory to look in.
X.P
XA simple shell like the one above can be found in the file \fBshell.nc\fP
Xin that directory.  More complex programs can also be found there.
X.P
XFrom this point, all you have left to learn in order to write sophisticated
Xnews clipping programs are the various special variables, functions and
Xoperators that help you make your conditional expressions.  These
Xare documented, along with other tips and tricks, in the following
Xchapters.
X
X
X
X
X.H 1 "Operators & Searching"
X
X.H 2 "Integer Operators"
X.P
XYou have already been shown examples of a variety of integer operators
Xwithout explanation.   That's because we expect the use of these operators
Xto be fairly obvious to anybody who has done a little programming.
X.P
XNewsClip allows integer expressions just like C.  In fact, the priorities
Xof the operators are exactly the same.
X.P
XOperators allowed are \fB+\fP (addition), \fB-\fP subtraction and unary
Xnegation, \fB*\fP (multiplication), \fB/\fP (integer division) and
X\fB%\fP (integer modulus).
X.P
XThere are also some bitwise integer operators, namely \fB|\fP (bitwise or),
X\fB&\fP (bitwise and) and \fB^\fP (bitwise exclusive or).
X.P
XLogical operators include \fB!\fP (unary not), \fB&&\fP (logical and)
Xand \fB||\fP (logical or).  The logical ``and'' and ``or'' operators work
Xefficiently.
XThus if you have \fBa && b\fP, \fIa\fP is evaluated, and if it is false,
X\fIb\fP is not evaluated as it can't change the result of the expression.
X.P
XLogical arithmetic is done with integer values.  Non-zero values represent
Xtrue, and the value zero represents false.  In most cases, true is represented
Xby one.  In fact, there are two predefined constants, \fBtrue\fP and
X\fBfalse\fP, which are 1 and 0, respectively.
X.P
XThere are several comparison operators.  They return 1 or 0 depending on
Xwhether the comparison is true or false.  For integers, the operators include
X\fB==\fP (equality), \fB!=\fP (inequality), \fB>\fP (greater than),
X\fB<\fP (less than), \fB>=\fP (greater or equal) and \fB<=\fP (less or
Xequal).
X.P
XString, userid and newsgroup values can also be tested for equality or
Xinequality.  String equality is case sensitive, but most strings from
Xheaders are converted to lower case in advance so that caseless matching
Xcan be done.  When testing for string equality on variables like
X\fBfrom\fP, always use lower case strings.  For example,
X\fBfrom == "foo@bar.uucp"\fP will work while \fBfrom == "foo@bar.UUCP"\fP
Xwill never match.
X
X.H 2 "Pattern Matching (has)"
X.P
XAn important tool in judging news articles is ``searching'' or
X\fIpattern matching\fP.  Most
Xof an article is simply strings and text, and often you will wish to
Xfind out if certain words, strings or phrases are contained within
Xthis text.
X.P
XPattern matching is done with the \fBhas\fP operator.  You can ask
Xwhether a string ``has'' a pattern, which is to say whether something
Xthat matches the pattern is contained within the string.  For example,
X.Bb
Xsubject has "fusion"
X.Be
Xis true if the word ``fusion'' appears anywhere in the \fBsubject\fP, and
Xfalse otherwise.
X.P
XWhile you can get a lot done searching for words and substrings like
X``fusion'', NewsClip actually supports searching for much more
Xcomplex patterns, defined by a language known as ``regular expressions.''
X.P
XIf you have used any of the popular Unix text editors such as ``vi'' or
Xeven ``ed,'' you will be somewhat familiar with regular expressions.
XRegular expressions allow you to search not just for a simple string,
Xbut variations of the string.
X.P
XNewsclip uses the regular expression language used by the Unix searching
Xcommand \fBegrep\fP.  This is a superset of the language used by the
X\fBed\fP text editor.   If you don't know about regular expressions
Xat all, we advise you to read about them in the manual for \fBed\fP
Xor \fBgrep\fP.  If you are already familiar with \fBed\fP, then
Xyou can read the description of the extensions to that language found
Xat the end of this chapter.
X.P
XNote that the ``or'' operator (\fB|\fP) in regular expressions is
Xcurrently not very efficient.
X.P
XIn general, the characters in the set ``\fB^$.[]()+?|\\*\fP'' have
Xspecial meanings.  If you want to match one of those characters
Xliterally, you must preface it with a backslash.  For reasons too confusing
Xto be explained here, if you want to match a literal backslash, you must
Xuse four (4) backslashes in quoted pattern strings.
X
X.H 3 "Searching"
X.P
XThe most common place to search will probably be the \fBsubject\fP line.
XThis is easy, and an example is shown above.  You can also search
X\fBuserid\fP variables (although you will only search the mail address
Xpart) or other strings.   You can also search a \fBnewsgroup\fP variable,
Xsince they are always converted to strings when necessary.
X.P
XIt is possible to search in arrays.  For example,
X\fBnewsgroups has "^comp"\fP will tell you if any of the newsgroup names
Xstarts with ``comp'' -- ie. if any are in the computer hierarchy.
X.P
XYou can also search databases -- see that chapter for more details.
X.P
XPatterns need not be constant strings.  They can also be variables and
Xexpressions of the string, userid or newsgroup types.   In fact, they
Xcan even be arrays of those types.
X.P
XIf you search for an array of patterns, you will get a true result if
Xany of the patterns is found in the area you're searching.   You can
Xeven search for an array of pattern-strings in another array of
Xstrings.
X.P
XSearching has all sorts of general uses.  For example,
X\fBfrom has "@bad.site.edu$"\fP tells you if the article was posted
Xfrom the specified site.   \fBSubject has "^re:"\fP tells you if the
Xsubject was generated in a followup.   \fBDistribution has "^usa"\fP
Xtells you if one of the article distributions was the USA's national one.
X.P
XNaturally, you can combine searches with logical operators.  For example,
X.Bb
X\fBsubject has "star wars" && subject !has "reagan|sdi"\fP
X.Be
Xmight help you
Xfind articles about the movie, but not about the laser defence system.
X\fB!has\fP, if you didn't guess, checks to see if a pattern is \fBnot\fP
Xin the specified string.
X.P
XNote that pattern matching is case sensitive.  Most header items you
Xwill search, such as the subject, summary and from lines, are lowercased
Xin advance.  You should thus only search for lower case letters in such
Xregions.  If you set the \fBpreserve\_case\fP flag, mapping of header
Xsections to lower case will not be done.
X.Bb
Xextern int preserve\_case;
Xpreserve\_case = true;
Xaccept if subject has "Reagan"
X.Be
X
X.H 4 "Typing in Patterns"
X.P
XPatterns will normally be constant strings.  This presents an
Xinteresting problem when attempting to escape special characters.
X.P
XFirst of all, it is possible to include escape sequences, which start
Xwith backslash, inside constant strings.  For example, you can insert
Xa quote with \fB\\"\fP, and things like newlines, tabs and general
Xcharacters with strings like \fB\\n\fP, \fB\\t\fP and \fB\\008\fP --
Xall the standard escapes for constant strings in the C language.
X.P
XThe backslash character is also used in patterns to escape regular
Xexpression metacharacters like ``\fB$\fP'' and ensure they get matched
Xas literal characters.  If we followed the C rules, you would need to
Xtype two backslashes before everything you wish to escape.
X.P
XFortunately the C escape characters and the pattern matching metacharacters
Xdon't overlap, except in one place, the backslash itself.
X.P
XThus if you type something like \fB\\$\fP in your constant string, it is
Xmapped to a real backslash and a real dollar sign, instead of just a dollar
Xsign the way C would.   You can thus type in your patterns much the
Xsame way as you would type them to a text editor.  If you want to type
Xtwo backslashes before your \fB$\fP, you still can, though.
X.P
XThe one problem is backslash itself.  It needs to be escaped in both
Xmethods.  This means that if you want to get a pattern that matches
Xa real backslash, you need four (count 'em, four) backslashes.  Fortunately,
Xyou don't have to do this a lot.
X.P
XExamples:
X.Bb
Xstr has "abc$"		/* matches abc at end of line */
Xstr has "abc\\$"	 /* matches abc$ */
Xstr has "abc\\\\$"	  /* also matches abc$ */
Xstr has "abc\\\\\\\\$"	    /* matches abc\\ at end of line */
X.Be
X
X.H 3 "Searching the Body"
X.P
XYou may wish to search for more than just items from the article header.
XIf you want to really check over an article, you can perform searches in
Xthe body or ``contents'' of the article.
X.P
XThe NewsClip language defines 5 different regions of the article body
Xthat you can search in.  Special pre-declared names have been given to
Xthese regions.  These names can only be used with the \fBhas\fP operator
Xand some special functions.
X.P
XThese names are:
X.sp 0.5V
X.in 0.4in
X
X.sp 0.7V
X.ti -0.4in
Xbody
X.P
XThe entire text of the article, not including the header, but including
Xthe signature.
X.sp 0.7V
X.ti -0.4in
Xtext
X.P
XThe text of the article, not including the signature.
X.sp 0.7V
X.ti -0.4in
Xsignature
X.P
XThe signature of the article, but none of the text.
X.sp 0.7V
X.ti -0.4in
Xnewtext
X.P
XThe text of the article that is original, which is to say not included
Xfrom a previous article.  The signature is not included.
X.sp 0.7V
X.ti -0.4in
Xincluded
X.P
XThe text of the article that was included from a previous or parent article.
X.in -0.4in0
X.P
XFor example, \fBtext has "hello"\fP would return true for articles that had
Xthe word ``hello'' anywhere in their text, but not if it appeared in the
Xsignature section of the article.
X.Bb
Xreject if body has "compact disc";
X.Be
Xwould reject all articles that contain that string.
X
X.B "Important Note:"
X.P
XPattern matching is case sensitive, but normally all sections of the
Xarticle body are lower cased before searching is done.  Thus you should
Xspecify all your search strings in lower case.  There is an integer
Xvariable named \fBpreserve\_case\fP, which if declared and set true,
Xstops the mapping of the article body and various header sections to lower
Xcase.  If you set this
Xflag, your patterns must be in the exact case you're looking for.
X
X.H 3 "Body Parts"
X.P
XAs you may know, there are no official definitions on USENET for what
Xdelineates an article's text from its signature, and even for what distinguishes
Xlines included from a previous article from original lines.
X.P
XIn order to make these distinctions, the NewsClip text processing library uses
Xsome special patterns to spot included lines and the start of a
Xsignature.
X.P
XThe default signature pattern is a line that starts with 2 dashes, and is
Xfollowed by any number (including 0) of spaces and the end of the line.
XThe regular expression is ``\fB^-\- *$\fP.''  You can change this
Xpattern with the special procedure \fBset_signature_start\fP.
XYou might try
X.Bb
Xset\_signature\_start( "^(-\-\-*|====*)$" );
X.Be
Xto match lines of dashes or
Xequals signs -- whatever suits your fancy.
X.P
XIncluded lines are deemed to be those that start with a special pattern.
XThe default is the pattern ``\fB[>:%#]\fP,'' meaning any one of those
Xfour characters.  The styles used vary from user to user, and no one style
Xwill be correct.  If you want to get really fancy, you can set the pattern
Xaccording to the poster using \fBset\_include\_prefix\fP.   It is perhaps
Xsafest just to use the ``>'' character, which is the default used by
Xmost news posting programs.  You do not need to worry about white space
Xin front of the pattern.  That is removed before the pattern check is done.
X.H 3 "Speed & Space"
X.P
XIf you don't really care about the various parts of the article body, just
Xdo all your searches on the whole body (\fBbody\fP) -- it's faster.
X.P
XIn fact, it's worth noting that any call to scan the article body can be
Xfairly time consuming because of the large amount of disk I/O required.
XYou will probably want to scan the article body only in certain newsgroups.
X.P
XBecause you might do several scans on an article body, the first such scan
Xreads the body into memory.  On some machines, memory is limited.  This
Xmeans that only the first N bytes of the article can be scanned.  Most
Xmachines have no limit, although a ``small model'' news filtering program
Xon an Intel 80286 might have a limit of 40 kilobytes or so.   This is
Xnot a problem, as the average USENET article is only 2 kilobytes in length,
Xand the rare articles that exceed 40K are all source code and binary postings
Xthat you probably don't wish to search at all.  (A news filter compiled
Xfor ``large model'' on the 80286 need have no limit.)
X
X.H 3 "Paragraph Scan & White Space"
X.P
XThe NewsClip library offers the ability to scan text as a range of
Xparagraphs rather than a range of lines.  If you set the integer
Xvariable \fBparagraph\_scan\fP to be true, you will activate this feature.
X.P
XIn this mode, lines will be grouped into paragraphs before they are searched.
XThat means that if you are searching for a phrase, and it happens to cross
Xa line boundary, you will still find it.  You would not find it in the
Xregular line mode.
X.P
XTo further help in this, another integer variable \fBwhite\_compress\fP
Xcan be set to be true.  If you do this, all runs of spaces, tabs, form
Xfeeds and newlines (in the case of paragraph mode) will be compressed to
Xa single space.  Thus you can search without worrying about how the poster
Xspaced his or her phrases.   This can also be achieved through clever use
Xof regular expressions.
X.P
XIt is worth noting that these two modes default to off because it does take
Xtime to do all this processing, and if you don't need these special features,
Xyou won't want to spend the time on them in every article.  In many cases
Xyou may decide that it's more important to be efficient than to properly
Xmatch every valuable USENET article.
X
X.H 2 "Article Statistics"
X.P
XAside from searching the body of an article, you can also get statistics
Xon the sizes of the various sections.
X.P
XThe \fBbyte\_count\fP function gives the number of bytes in a text section.
XFor example, \fBbyte\_count(signature)\fP tells you how many bytes there
Xare in the signature.  With this, you can also do comparisons on the
Xrelative sizes of article sections.
X.Bb
Xreject if byte\_count(text) / byte\_count(signature) < 2
X.Be
Xrejects articles where the signature is more than half as big as the
Xtext of the article.
X.P
XYou can also count lines with \fBline\_count\fP.
X.Bb
Xreject if line\_count(newtext) < line\_count(included);
X.Be
Xrejects articles that are mostly included material from another article.
X.P
XThese two forms of article statistics involve reading the whole body of
Xthe article, which takes time.   If all you want is the number of lines
Xin the body, the header variable \fBlines\fP, while not always correct,
Xis usually good enough.
X.P
XLikewise the variable \fBarticle\_bytes\fP, which gets the size of the
Xwhole article through the use of the Unix \fIstat\fP function, can be
Xa much faster way of accurately measuring the size of an article.
XYou can also get the integer variable \fBnum\_links\fP  (the number of links
Xto a cross-posted article file) and the date variable \fBwrite\_time\fP
X(the most recent time the article file was written to) which are also
Xobtained by NewsClip from a \fIstat\fP.
X.P
XThe string variable \fBarticle\_filename\fP lets you know the file,
Xif any, in which the current article resides, and the integer variable
X\fBarticle\_number\fP tells you the article number, assuming it fits in
Xan integer on your machine.
X
X.H 2 "Searching for Scalars (in)"
X.P
XYou can use the \fBin\fP operator to test for the presence of a value
Xor set of values in an array or a database.  For example:
X.Bb
Xreject if #alt.flame in newsgroups;
X.Be
Xrejects articles if the newsgroup ``alt.flame'' is one of the members of the
X\fBnewsgroups\fP array.   You can think of this operator as similar to the
X``\(*e'' (set membership) operator from set theory.
X.P
XTesting for the presence of a constant newsgroup in the \fBnewsgroups\fP
Xarray is such a common thing that we have included a shorthand for it
Xthat you have already learned.  This is the \fBis\fP operator.  Thus
X\fBis alt.flame\fP is the same as \fB#alt.flame in newsgroups\fP.
X.P
XYou can search for integers in integer arrays, newsgroups in newsgroup
Xarrays, and newsgroups, userids or strings in arrays of newsgroups, userids
Xor strings.   You can even search for arrays in other compatible arrays.
XThe \fBin\fP operator returns true if any match is found.
X.P
XUnlike the pattern matching \fBhas\fP operator, exact matches are required
Xin the string comparison.
X.P
XYou can also search for strings or string compatible arrays in databases.
XMore on that in the database chapter.
X.P
XOne array you might find useful is the \fBpath\fP array.  That's an array of all
Xthe sites that have passed along the news article.  If you are programming
Xthe filtering of a news feed, you will want to ensure you don't send your
Xfeed site any articles it has already seen.  If you are feeding site
X``foo'', you will want to write:
X.Bb
Xreject if "foo" in path;
X.Be
END_OF_FILE
if test 51014 -ne `wc -c <'doc/man.mm.1'`; then
    echo shar: \"'doc/man.mm.1'\" unpacked with wrong size!
fi
# end of 'doc/man.mm.1'
fi
echo shar: End of archive 14 \(of 15\).
cp /dev/null ark14isdone
MISSING=""
for I in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ; do
    if test ! -f ark${I}isdone ; then
	MISSING="${MISSING} ${I}"
    fi
done
if test "${MISSING}" = "" ; then
    echo You have unpacked all 15 archives.
    rm -f ark[1-9]isdone ark[1-9][0-9]isdone
else
    echo You still need to unpack the following archives:
    echo "        " ${MISSING}
fi
##  End of shell archive.
exit 0