brad@looking.ON.CA (Brad Templeton) (12/20/89)
Posting-number: Volume 9, Issue 82 Submitted-by: brad@looking.ON.CA (Brad Templeton) Archive-name: newsclip/part13 #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh <file", e.g.. If this archive is complete, you # will see the following message at the end: # "End of archive 13 (of 15)." # Contents: doc/man.mm.3 # Wrapped by allbery@uunet on Tue Dec 19 20:10:06 1989 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f 'doc/man.mm.3' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'doc/man.mm.3'\" else echo shar: Extracting \"'doc/man.mm.3'\" \(50212 characters\) sed "s/^X//" >'doc/man.mm.3' <<'END_OF_FILE' XA brief introduction to compiling and running your NewsClip filtering Xprograms was given in chapter 2. We will now explore this area Xin more detail. X X.H 2 "Compiling" X.P XThe \fBncc\fP compiler compiles your programs by translating them into XC programs, compiling these with your C compiler, and linking the result Xwith the NewsClip library. X.P XThe translation is fairly simple as compilations go, other than providing Xfor special conversions for NewsClip's data types. It is the library Xthat does most of the work, and thus makes it easy to write a XNewsClip program. X.P XWhen you compile with X.Bb Xncc myprog.nc X.Be Xeverything is done in one step. The source is placed in \fBmyprog.c\fP, Xthat is compiled, including a special file of definitions (usually found Xin \fB/usr/lib/news/newsclip/ucode.h\fP, and this is linked with the library, Xusually found in \fB/usr/lib/news/newsclip/cliplib.a\fP. The C program Xsource is left around for you to examine. The executable program, Xready to run, is placed in the file \fBnclip\fP in your current Xdirectory. X.P XYou can alter this a bit if you like. For example, you can skip the XC compile and link stage with the \fI-link\fP option, allowing you to Xexamine the resulting C program and compile it on your own. Options Xare described later. X X.H 3 "Preprocessor" X.P XThe \fBncc\fP compiler passes your input program through the X``C preprocessor.'' This is the same macro language and conditional Xcompilation facility that C uses. CPP \fIdirectives\fP are all keyed by lines Xthat begin with a ``#'' character. These include the X\fB#include "filename"\fP directive, which causes the contents of the named Xfile to be inserted into the compilation stream. X.P XIf you have a lot of little filtering routines for each newsgroup that Xyou put in individual files, you can get them all combined together Xwhen you compile with \fB#include\fP directives. Your big \fBswitch\fP Xstatement might look like: X.Bb X for( n in newsgroups ) switch( n ) { X#include "news/admin/kill.nc" X#include "news/groups/kill.nc" X#include "sci/physics/kill.nc" X#include "comp/sys/ibm/pc/kill.nc" X#include "rec/humor/kill.nc" X#include "rec/humor/funny/kill.nc" X } X.Be X.P XYou could then edit each file individually, as desired. X.P XOther directives include \fB#define\fP, which defines manifests constants Xand macros, and \fB#ifdef\fP/\fB#else\fP/\fB#endif\fP which allow Xconditional compilation based on whether or not a symbol has been defined Xwith \fB#define\fP or a command line options. X.P XA full exploration of CPP is beyond the scope of this manual. See Xdocumentation on the C language, as well as the ``man'' entry for XCPP in your own system's documentation. X.H 3 "Options" X.P XYou can control the compiling process to some degree by providing options Xto the compiler. X.P XThe compiler's primary argument is the sole input source file, which by Xconvention should end with the ``.nc'' (for NewsClip) extension. X.P XUntagged arguments with an extension of ``.c,'' ``.o'' or ``.a'' will not be Xtreated as NewsClip source programs, but rather as C source code, system Xobject code or library files. XThey will be passed directly to the C compiler to be linked in with your Xprogram. X.P XThe other options use LGS's own option style, which is a variant of the Xconventional Unix option style. Binary (on/off) options are preceded by Xa plus ``\fB+\fP'' or minus ``\fB-\fP,'' where plus means the option Xis turned on, and minus means the option is turned off. You can type Xa whole option name after the ``+/-,'' or just enough to uniquely Xdistinguish the option -- usually just a single letter. Thus X\fI-link\fP works as well as \fI-l\fP. X.P XValued options are written with a keyword (or perhaps the single letter Xabbreviation of the keyword), an equals sign ``\fB=\fP'' and a string Xvalue. For example, \fIo=myclip\fP. X X.H 4 "-link" X.P XThe \fI-link\fP option disables the C compile and link phase of compiling. XNo executable program will be produced. A C program with the same name Xas your source file (but with an extension of ``.c'') will be produced, Xassuming there are no errors. X X.H 4 "output=pathname" X.P XThis option specifies a name for the executable news Xfiltering program. The default is \fBnclip\fP. X X.H 4 "Define=defstring" X.P XThis option specifies a preprocessor definition to be Xpassed along to the C preprocessor. For example, \fID=bsd\fP would Xcause the manifest symbol ``bsd'' to be defined in \fB#ifdef\fP tests. XYou can specify several of these. X X.H 4 "Include=dirpathname" X.P XThis specifies a directory that the preprocessor Xshould search for files included with the \fB#include\fP directive. XYou can specify several of these. X X.H 4 "intermediate=file.c" X.P XThis allows you to specify an alternate Xintermediate name for the generated C program. Normally this name will Xbe derived from the name of the source file. The provided name must end Xwith ``.c.'' X X.H 4 "ccoption=option" X.P XThis lets you specify a string that Xis to be passed directly along to the C compiler for the compile and Xlink phase. You can pass any special local options your C compiler Xneeds. X X.H 4 "-externals" X.P XThe \fI-externals\fP option disables the ability of users to make Xexternal import declarations of symbols other than those in the Xapproved list of the NewsClip language. This limits the language Xto the definition in this manual. X.P XThis is only a very mild security feature, and any capable malicious Xprogrammer could get around it fairly easily. If you are going to Xallow remote sites to submit newsclip feeding programs to you, it is Ximportant that you create independent system userids for these programs, Xand run them with the real and effective userid properly set. Do Xnot use the ``uucp'' or any other system userid. XDepend on operating system tools for all your security, not this option. X X.H 3 "Single-User" X.P XIf you only have a single user copy of NewsClip, and, because you Xare not a system administrator, you have been unable to install XNewsClip files in system directories, then the files \fBcliblib.a\fP Xand \fBucode.h\fP must be in your current directory when you compile. X X.H 2 "Externals" X.P XSo long as the \fI-external\fP compiling option is not used, NewsClip Xprograms may make external declarations for arbitrary C routines. This Xincludes routines from the standard C library, or routines from Xspecial C source or object code modules provided on the \fBncc\fP Xcommand line. X.P XFor users willing to write their own C code, the potential here is Xtruly unlimited. The NewsClip language has been designed to be Xsimple and special purpose. There are some less common things that Xare simply not easy to do within it. External functions can do all Xthis for you. X.P XEven if you have source code to the NewsClip compiler, we advise you Xto do any special tricks with your own C code, rather than by changing Xthe compiler to extend the language. Neither route is officially Xsupported, but the former is preferred. X.P XImportant note: Since the case of letters in NewsClip doesn't matter, Xall C externals must be entirely in lower case. If you want to call Xan existing routine that has upper case letters in its name, you will Xhave to write a small interface routine to do the calling. With variables Xthat have upper case names, you will be out of luck. X X.H 2 "Filtering" X.P XOnce you have compiled your program, there are several ways you can Xrun it to filter news articles. We'll assume your program is in X\fBnclip\fP for now. First of all, \fBnclip\fP has a number of Xcommand line options you can use to control its operation. X.P XMost important are the ``modes'' of operation, specified with the X\fImode=\fP option. Essentially, you have written a subroutine which, Xwhen passed an article, decides whether to accept or reject that article. XThe control portion of the \fBnclip\fP program sets up how the articles Xwill be gathered and submitted to your procedure, and what will be done Xwith the results. X.P XYou are already familiar with \fInewsrc\fP mode, which you get by Xusing the \fImode=newsrc\fP option. We will explain it in more Xdetail here. X X.H 3 "Newsrc Mode (mode=newsrc)" X.P XIn \fInewsrc\fP mode, the \fBnclip\fP program processes a standard Xformat \fB.newsrc\fP file. Most newsreaders keep track of what the Xuser has read with a file of this name in the home directory. The XRN newsreader also keeps other files in the same directory as this file. X.P XIn \fInewsrc\fP mode, \fBnclip\fP also keeps a file Xcalled \fB.newsrclas\fP to keep track of the last article that has been Xprocessed by the \fBnclip\fP program in each desired newsgroup. This Xis necessary because it's not possible to tell where to start processing Xjust from the \fB.newsrc\fP file and the news \fBactive\fP file. X.P XWhen run in \fInewsrc\fP mode, \fBnclip\fP examines the \fB.newsrc\fP Xfile, \fB.newsrclas\fP file and the USENET active file X(usually \fB/usr/lib/news/active\fP). From these it calculates the Xrange of unread articles that must be processed. X.P XFirst it calls your \fBinit\fP procedure. X.P XIt then loops through the subscribed newsgroups in the \fB.newsrc\fP Xfile. As it starts each group, it calls your \fBstartgroup\fP procedure. XIt then goes through all the appropriate articles, and calls your X\fBarticle\fP procedure on each one. Each rejected article is marked Xas read. When the group is done, the \fBendgroup\fP procedure is called. X.P XWhen all is done, the \fBterminate\fP procedure is called, and the X\fB.newsrc\fP file is written out, with all the rejected articles marked Xas read. The \fB.newsrclas\fP file is written out with all articles Xmarked as processed. (This way, if you call \fBnclip\fP again immediately, Xit will do nothing unless new articles have arrived on your machine.) X.P XSome options and environment variables affect this procedure. See below. X.H 3 "Filter Mode (mode=filter)" X.P XThis mode works quite differently, and does not even involve the X\fB.newsrc\fP or \fBactive\fP files. Instead, it expects a list of Xfilenames to appear on the standard input. Each file should be a XUSENET article file. Each such article will be passed to your X\fBarticle\fP procedure. If the article is accepted, its filename Xwill be written to the standard output. If the article is rejected, Xnothing is written. X.P XThe result is a filtered list of accepted filenames. This is ideal Xfor controlling a batched feed to another site. Many news systems run Xby having the news processing programs output a list of article files Xto a special file. Periodic programs examine this file and batch together Xthe articles found in it. X.P XSimply modify your batching procedure to have the file processed by X.Bb Xnclip <batchfile X.Be Xand feed the output list into your batcher. Beware that it might be Xempty! X.P XNote that the entry point procedures \fBstartgroup\fP and \fBendgroup\fP Xwill not be called in this mode, as there is no definition of when a Xgroup starts and when a group ends. X.H 3 "Batch Mode (mode=batch)" X.P XThis mode is an alternative to \fIfilter\fP mode for generating a list Xof accepted article files. Instead of taking input from a file list, it Xtakes it from a \fB.newsrc\fP and \fBactive\fP file, just like X\fInewsrc\fP mode. X.P XThe accepted files have their filenames printed to the standard output. XThe \fB.newsrc\fP file is updated to mark \fBall\fP the articles as Xread, whether they were accepted or rejected. This makes the counts in Xthe \fB.newsrclas\fP file somewhat redundant, but they are still used, Xas it makes the process more efficient. X.P XThis way, you can maintain a feed through a \fB.newsrc\fP file, and Xhave no entry in the news \fBsys\fP site subscription file. You get Xcontrol on a newsgroup by newsgroup basis, and of course the full Xfiltering ability of \fBnewsclip\fP. The only thing that's not automatic Xis the automatic adding of new newsgroups in subscribed hierarchies. XTo do this, you must process \fBControl\fP messages in the {mono control} Xnewsgroup, and use the \fBsubscribe\fP procedure to add them to the X\fB.newsrc\fP. The sample program \fBfeed.nc\fP shows how to do this. X.P XIn this mode, the \fBstartgroup\fP and \fBendgroup\fP entry points Xare used. X.P XWhen using \fIbatch\fP mode, it is advisable to use the \fInewsrc=\fP Xoption to explicitly specify the location of the \fB.newsrc\fP file. X.P XIn \fIbatch\fP mode, it is strongly suggested that your programs import Xthe \fBxref\fP variable. (You don't need to do anything with it, just Xextern it.) This will assure that cross posted articles are not Xexamined or accepted twice. If your news system does not support Xthe \fBXref:\fP line, then you must use another scheme to avoid Xduplicating crossposts. See the sample feed program for details. X X.H 3 "List Mode (mode=list)" X.P XThis mode reads from a \fB.newsrc\fP and \fB.newsrclas\fP file, Xand outputs a list of accepted article filenames, just like \fIbatch\fP Xmode. It does not, however, update the \fB.newsrc\fP file, so if you Xrun it multiple times, you will get the same list, or possibly an Xextended one if new articles have arrived. X.P XAll the same warnings that apply to batch mode apply here. X X.H 3 "Pipe Mode (mode=pipe)" X.P XIn this mode, \fBnclip\fP expects to enter a dialogue with the program Xthat called it, which is assumed to be a newsreader. The program Xtakes commands on the standard input, assumed to be a pipe from the Xnewsreader, and gives back answers on the standard output, assumed to Xbe a pipe back to the newsreader. X.P XIn this case, you don't actually run your \fBnclip\fP program. Your Xnewsreader calls it for you and does all the talking to it that's Xrequired. We have adapted many newsreaders to work in this way, including Xthe popular RN newsreader. X.P XIn general, the commands ask \fBnclip\fP to examine articles, and Xthe answers accept or reject the articles. A typical newsreader would Xfilter all articles through the concurrent \fBnclip\fP process before Xpresenting them to the user. X.P XThe actual command structure is beyond the scope of this chapter. It Xis documented in a special manual available free from Looking Glass XSoftware Limited. X.P XThere are two things to be aware of here. When a newsreader starts a Xnew newsgroup, it may query the filter program about the group in general. XThis will cause a call to \fBstartgroup\fP. If you set the Xspecial \fBaccept\_all\fP or \fBreject\_all\fP flags, this will be Xcommunicated to the newsreader, which can then decide not Xto filter more articles in that newsgroup. X.P XIf your newsreader is the type that likes to do all its filtering right Xat the start of a group, you will soon discover that you don't Xwant to filter all groups like this. X.P XThe communication protocol also has a facility so that the newsreader X(or perhaps the user) can issue ``kill'' commands to the news filter. XSuch commands would be intended to tell the filter to store strings like Xmessage-ids and users in its databases. The interpretation of these Xcommands is up to you. X.P XWhen such a command comes, the entry point \fBcommand\fP will be called, Xwith the command string as a single string argument. You should check Xand process the command. If it is a valid command, terminate by Xissuing an \fBaccept\fP statement. If it is an invalid command, terminate Xby issuing a \fBreject\fP statement. The default is to reject. If you Xdon't define a \fBcommand\fP procedure, all commands will be rejected. X.P XMore information on \fIpipe\fP mode may be included in the documentation Xfor readers that support it. The interface is general, so that any kind Xof news filtering program can be adapted to it -- not just those Xcompiled with the NewsClip system. X X.H 2 "Using It" X X.H 3 "Pipe Mode" X.P XThe ideal mode of operation for NewsClip programs is a smart newsreader Xthat can talk to the program in \fIpipe\fP mode. To do this, compile Xyour program as \fBnclip\fP (that's the default) and place it either Xin the same directory as your \fB.newsrc\fP file, or in one of the Xdirectories named in your \fBPATH\fP environment variable. In Xmost cases, your home directory is the place. X.P XThen run your newsreader. It should start up your \fBnclip\fP program Xand talk to it for you. There will be nothing for you to do. X.P XIf your smart newsreader uses the standard \fB.newsrc\fP file, then you Xcan still run your program in \fInewsrc\fP mode as described below. You Xmay find this is a handy way to save time. Run the program on your X\fB.newsrc\fP at night or in the background. This will scan articles Xand update your \fB.newsrc\fP so that it's already done when you Xstart reading. X.P XThis is particularly useful with large groups that you reject almost Xall the articles of. X X.H 3 "Newsrc Mode" X.P XIf a smart newsreader is not available, or even if one is, you can Xuse your filter program with any newsreader that understands the X\fB.newsrc\fP file in \fInewsrc\fP mode. X.P XSet up your filter program and test it. Then arrange to run it Xregularly in the background with: X.Bb Xnclip mode=newsrc X.Be X.P XIt will check all the new articles, and get rid of the ones you don't Xwant. Run this at night from your \fBcron\fP if possible. Start Xit up in the background from your \fB.login\fP or \fB.profile\fP Xscript when you log in to your system, and just wait a short time before Xyou start reading news. X.P XIf new articles arrive during your newsreading session, your newsreader Xwill show them to you, of course, as they have not been filtered. There Xis little way around this. If you complete a newsreading session, rather Xthan going around for a second session immediately, you should quit, Xrun your filter program again, and go back into the newsreader. This Xshould help you avoid articles that should be rejected. You will still Xsee the odd one, but that should not be a big deal. X.P XIf you want to get fancy, you could leave groups unsubscribed, and use Xthe \fI+unsubscribed\fP option (see below) to only show those groups Xto you after processing. Unfortunately, you would need to unsubscribe to Xall the groups at the end of your session, and there is no mechanism to Xdo this. X.P XOne idea is to set up a ``las'' file with the names of your unsubscribed Xgroups, and then set up a special NewsClip program to search through them. XRun this program at night every few days with \fI+only\fP, \fI+unsubscribed\fP Xand the \fIlas=\fP option. As you only run this irregularly, you can do Xthings like full text searches for important keywords. X.P XThe \fIpipe\fP mode system has been designed to be added simply to most Xnewsreaders. Patches exist or are under development for many of the Xpopular newsreaders. X X.H 3 "Options & Environment" X.P XTwo options and environment variables let you specify where the X\fB.newsrc\fP and related files will reside in the modes that deal Xwith a \fB.newsrc\fP file. (The options supersede the environment Xvariables.) X X.H 4 "directory=dirpath" X.P XThis option and the \fBDOTDIR\fP environment Xvariable let you specify the directory to look for the \fB.newsrc\fP, X\fB.newsrclas\fP and \fB.rnlock\fP files. (The \fB.rnlock\fP file Xis RN's way of ensuring two programs don't go at the \fB.newsrc\fP at once.) X X.H 4 "newsrc=pathname" X.P XThis option specifies an exact location for the X\fB.newsrc\fP Xfile. The name of the \fB.newsrclas\fP file is generated by appending X``las'' to that name, so you should ensure there is enough room in the Xfilename to do this. The \fB.rnlock\fP file is not used. X X.H 4 "las=pathname" X.P XThis lets you explicitly set the name of the last Xarticle seen file. This is handy with the \fI+only\fP option. X.P XUse this option when testing, when in \fIbatch\fP mode, or when dealing Xwith a file generated by the \fBmknewsrc\fP program. X X.H 4 "option=string" X.P XThis lets you specify options that will be Xpassed down to the NewsClip program. The option strings are placed Xin the global string array named \fBoptions\fP. The user can import Xthis array and search for items in it. X.H 4 "+only" X.P XThe \fI+only\fP option specifies that only those groups already named Xin the \fB.newsrclas\fP file should be processed and filtered. (This Xonly applies in the \fInewsrc\fP, \fIlist\fP and \fIbatch\fP modes.) X.P XNormally if the \fB.newsrclas\fP file is missing, or if subscribed Xgroups are not found within it, they are added with a default last Xarticle seen of zero. With the \fI+only\fP option, no new groups Xwill be added. X.P XThis way, you can confine NewsClip processing to just a specific list of Xgroups. You can also do this internally with the \fBaccept\_all\fP Xand \fBreject\_all\fP variables. X X.H 4 "+unsubscribed" X.P XThis option causes the program to process even the Xunsubscribed groups found in the \fB.newsrc\fP file. If any article Xin an unsubscribed group is accepted -- this is assumed to be a rare case -- Xthen the group will be resubscribed so that you see it in your next Xnewsreading session. X.H 4 "warning=level" X.P XSets a warning level, currently from 0 to 4. The default is 1. The Xhigher the level, the more warnings you will get. Warnings are printed Xto the standard error output. X.H 4 "Spooldir=dirpath" X.P XSpecify an alternate news spool directory. This is for use by users Xwith a binary-only copy of NewsClip that use a machine with a non-standard Xspool directory. X.H 4 "Libdir=dirpath" X.P XSpecify an alternate news library directory. This is for use by users Xwith a binary-only copy of NewsClip that use a machine with a Xnon-standard library directory. X X.H 2 "Making .newsrc files" X.P XCompiling NewsClip programs is not difficult, and it's quite fast, assuming Xyour system's C compiler is of reasonable speed. This means that XNewsClip can make an ideal language for special purpose scans of the Xnews spool directories. X.P XTo help in this, we have created a special program called \fBmknewsrc\fP. XIt can make up a sample \fB.newsrc\fP file for you to use as input to X\fBnclip\fP in \fInewsrc\fP mode, named using the \fInewsrc=\fP option. X.P XThe \fBmknewsrc\fP program makes, by default, a \fB.newsrc\fP that Xshows every article in every group on the system as unread. This lets Xyou scan all the news spools with your filter program. X.P XIn the end, you will get a modified \fB.newsrc\fP with only the desired Xarticles marked unread. You can then point your newsreader at this new X\fB.newsrc\fP, perhaps with the \fBNEWSRC\fP or \fBDOTDIR\fP Xenvironment variables that \fBnclip\fP also uses. You will then get Xa newsreading session of just the desired articles. (Be sure to reset Xyour environment variables afterwards!) X.P X\fBmknewsrc\fP outputs the \fB.newsrc\fP style file on the standard Xoutput. You should redirect that where you want it. X.P XThere are some useful options to \fBmknewsrc\fP to help you cut down Xyour search. You will find them necessary, as a full search of a Xlarge system's complete USENET spools can take Xscores of minutes or even hours of disk I/O time. X.P XFirst of all, you may provide regular expression patterns as command Xline arguments. If you do, you will only be provided with newsgroups Xthat match those patterns. For example, \fI^comp\\..*\fP would give Xyou all the groups in the ``comp'' hierarchy. (Be warned that just Xas with \fBgrep\fP, you will have to escape certain special characters Xto save them from shell processing.) X.P XYou can also ask to scan only the most recent articles in each selected Xgroup. With the \fIpercent=%age\fP option, you can specify a number Xfrom 1 to 100 that tells what percentage of the available articles should Xbe marked unread. (You always get the most recent set.) X.P XThe \fI+newsrc\fP option arranges so that you only see groups that Xare marked as subscribed in your own \fB.newsrc\fP. The same rules and Xenvironment variables that \fBnclip\fP uses to find your \fB.newsrc\fP Xapply here. X.P XThe \fInewsrc=filename\fP option implies \fI+newsrc\fP, and specifies where Xthe \fB.newsrc\fP is to be found. The output is still written to Xthe standard output, however. X.P XHere are some typical steps, with an example following: X.AL X X.LI XWrite a newsclip program and compile it with \fBncc\fP. You may want Xto put the executable in a different place, for example \fBsrch\fP. X.LI XBuild a temporary \fB.newsrc\fP file for half the articles in the X``comp'' groups. X.LI XFilter for the articles you like. X.LI XRead the news. Then reset \fBDOTDIR\fP if you set it. X.LE X.Bb Xncc srch.nc o=srch Xmknewsrc p=50 '^comp..*' >/tmp/me/.newsrc Xsrch m=n n=/tmp/me/.newsrc Xsetenv DOTDIR /tmp/me Xrn Xsetenv DOTDIR $HOME X.Be X X X X X.H 1 "Tips and Traps" X.P XIn this chapter, we remind you of some important things to remember when Xcoding your NewsClip programs. In particular, important differences from XC are pointed out. X X.H 2 "Memory" X.P XDon't create any loops that keep allocating strings -- for example with X\fBconcat\fP. Temporary memory is just allocated in a big stack, and Xit is never freed up until an article is done. A loop could easily make Xyou run out of memory, aborting your session. X.P XNaturally, be equally careful of permanent memory that you allocate in Xdatabases and permanent strings. Be sure to free all databases that Xyou are not using. (This is not necessary within the \fBterminate\fP Xprocedure.) X.P XRemember, when you read a database in from a file, you still get a database Xthat uses some memory, even if the file is missing or empty. X X.H 2 "Strings" X.P XMake sure all your search strings are in lower case letters, unless you Xknow you are searching a text field or string that has not been converted Xto lower case. Normally almost all such things are pre-converted to Xlower case, so if you put upper case in your patterns or test strings Xyou will not get a match. X X.H 2 "Integers" X.P XIf your machine only supports 16 bit integers, you can only place values Xfrom -32768 to 32767 in your integers. It is very easy to overflow. XIn fact, in some newsgroups, the article numbers may already overflow Xyour integers. X.P XOne place to watch out is the running \fIscore\fP that you modify with Xthe \fBadjust\fP statement. If you adjust the score beyond the range Xof an integer, it could wrap around, causing exactly the wrong result. X.P XMake sure your adjustments are appropriate, and not so large that they Xmight overflow if they all go the same way. If you are worried that Xyou might reach overflow at a given point, import the \fBscore\fP variable Xand put the following statement in at various points in your procedure. X.Bb Xif( score > 25000 || score < -25000 ) X return; X.Be XThis will stop the process if the score gets ridiculously high or low. X.P XSome of the functions returning large things like Xarticle sizes will compensate for small integers by returning the Xlargest integer (ie. 32767) when the actual result is out of bounds. XYou may wish to watch for this if you were counting on an exact result. X.P XDate/time variables will always be able to hold more than a 16 bit integer, Xbut their use as anything but date values is discouraged. X X.H 2 "Nil Headers" X.P XIf you use any array, userid or string header variables that are not Xguaranteed to be in an article, then you should always check to make Xsure the variables don't have a nil value before you use one. If Xyou assign into some index of a nil array, you could get into real Xtrouble. X.P XUsually you do this with a short circuit \fB&&\fP operator, as in: X.Bb Xif( keywords != nilarray && "rot13" in keywords ) X reject; X.Be X.P XWith integer and date variables, you will only get a zero value, so it Xmay not be absolutely necessary to check, but it is still always a good Xidea. X X X.B "Important Note:" X.P XRemember this: A nil array is not the same as an empty array. A nil string Xis not an empty string (\fB""\fP). If you use variables that might Xbe nil, beware. X.P XIn general, it's a good idea to use variables that can't be nil, Xsuch as \fBrdistribution\fP. You can also make your own functions to Xdo certain tasks for you. For example: X.Bb Xstring Xsafestring( string s ) X{ X return s == nilstring ? "" : s; X} X.Be Xcould be applied so that nil strings always become empty strings. You Xcould also define this as a CPP macro, or just use the \fB?\fP query Xoperator wherever necessary. X.P XThe nil values are important, as they let you test if a header field was Xpresent in the article at all. In some cases, such as the \fBApproved:\fP Xheader, the important thing is that the header is present. Currently, at Xleast, it doesn't matter what's in it. X.H 2 "Cross Posting" X.P XIf you are writing a program to be used in \fIbatch\fP mode, be sure to Xinclude the declaration: X.Bb Xextern string array xref; X.Be Xsomewhere in your program, or use some other system to avoid duplicate Xarticles. X.P XYou may want to do this \fBextern\fP even in \fInewsrc\fP mode, to Xsimplify processing. It does nothing in the processing modes that don't Xwork with a \fB.newsrc\fP file, like the \fIpipe\fP and \fIfilter\fP modes. X.P XAnother way to eliminate crossposts is to reject all articles where the Xfirst newsgroup in the \fBnewsgroups\fP array is not the current newsgroup, Xso long as that first newsgroup is a \fBnewsrc\_group\fP. (If it isn't Xyou will want to key on the first newsgroup in the array that is found Xin the \fB.newsrc\fP.) X X.H 2 "Speed" X.P XDon't import externals that you don't need. Sometimes just importing Xan external variable requests pre-processing that takes time. XThis applies to Xall the header variables, along with \fBdistribution\_level\fP and some Xof the statistical variables. X.P XBe conservative with your use of references to segments of the article Xbody. This can involve lots of disk I/O if you have lots of articles Xto scan. We advise that you keep body scans to your newsgroup specific Xcode. If you have a body scan for every article, you can expect the Xprogram to take a lot more time. Of course, NewsClip is quite fast, Xso this may be acceptable, particularly if it saves \fIyou\fP time. X.P XTry to use the variables like \fBlines\fP and \fBarticle\_bytes\fP Xthat don't usually require the reading of the whole article. Note Xthat \fBarticle\_bytes\fP sometimes does have to read the whole article Xwhen you are running in pipe mode on a system that doesn't have the Xnews article files. X.P XIn general, your code is getting compiled to C, and thus directly to Xmachine code. Don't be afraid of loops and integer operations in your code. XThey should go quite quickly. X.P XOptimize where you can with the use of the \fI+only\fP option or the X\fBreject\_all\fP and \fBaccept\_all\fP variables. Try the \fBnamed\_group\fP Xtrick described in the chapter on general technique. X.P XStick to simple patterns where possible -- they search faster. Also, Xuse constant patterns where you can. When your NewsClip program is Xrun, your constant patterns (quoted strings to the right of a \fBhas\fP Xoperator) get converted into the internal regular expression language only once, Xinstead of each time a search is done. X.P XIn particular, the or-bar (\fB|\fP) regular expression feature is not very Xefficient. It can often be significantly faster to code: X.Bb Xbody has "foo" || body has "bar" || body has "abc.*def" X.Be Xthan X.Bb Xbody has "foo|bar|abc.*def" X.Be Xparticularly if you put the most likely patterns first. X X.H 2 "Patterns" X.P XDo be sure to watch out for the regular expression ``metacharacters.'' XThese are ``\fB^$.[]()+?|\\*\fP''. If you're an \fBed\fP or X\fBgrep\fP user, this will be second nature to you, although you Xshould still watch out for the extra \fBegrep\fP characters, particularly Xthe parentheses, plus, question mark and or-bar. X.P XIf you wish to store a literal string in an array or database for later Xuse in searching, you may wish to apply the string function X\fBliteral\_pattern\fP to it. This is always wise if you're taking Xsomething like a subject line, which could contain all sorts of Xcharacters. X X.H 2 "Databases" X.P XIf you regularly search for a string array in a database, such as the Xpopular search for \fBreferences\fP in a database of bad message-ids, then Xonly the first entry found will get its ``access time'' updated. If the Xwhole \fBreferences\fP array is found in the database, only the first Xwill get marked as accessed. X.P XThis means that the later IDs will eventually fade away from the database. XThis should not present a problem, since they will all be children of Xthe parent ID in normal circumstances. X.P XIf this could cause a problem, you will have to write your own \fBin\fP Xfunction, which performs a loop, and doesn't stop after an entry is found. XThis will update all entries, but it might take a bit longer. X X.H 2 "Working With Newsreaders" X.P XSome newsreaders, like RN, have a powerful macro language. You will find Xthat it is possible in RN to define macros that will do automatic updates Xof your databases of bad messages, bad users, good or bad subjects or Xwhatever you please. If you build your NewsClip program from a Xseries of \fB#include\fPd group files, you can even set up macros to Xdo automatic edits of those files when desired, and then recompile the Xwhole thing with a \fBMake\fP file. See the RN manual for details. X.P XYou can also issue commands directly to your NewsClip program Xdirectly from a modified reader like RN. See our special appendix on Xthat topic. X X.H 2 "Kill Files" X.P XExactly duplicating the kill file interface of RN is not simple, although Xit can be done. The interface in NewsClip is of course, much more Xflexible. RN's kill files can issue commands on articles that match Xheaders in the subject line, entire header and body. It's Xeasy to do pattern searches in the subject or article body with NewsClip. XYou can't search the entire header, but the RN header search was only Xprovided to simplify the KILL file interface. X.P XIf you want something that's like a kill file, just read a local KILL Xdatabase for your newsgroup and say: X.Bb Xreject if subject has killdb; X.Be Xor X.Bb Xreject if body has killdb; X.Be XIf you want to keep it all in one database, you could read in the Xdatabase, and then do a loop splitting the database into a bunch of Xdifferent arrays or databases of patterns, using the integer key values. X X.H 2 "Variant Parsing" X.P XYou may not wish to have your header lines handled the same way in Xevery newsgroup. For example, in one newsgroup you might wish the X\fBkeywords\fP line to be delimited with spaces, and in another you Xmight wish commas. (Normally it uses commas.) X.P XYou can't do that with the normal header variable declaration system, Xas the parsing of the header variables is done before you get to process Xthe article yourself. X.P XThe solution is to define your header variables as simple strings, as in: X.Bb Xheader string keywords : "keywords"; X.Be Xand then parse the string yourself. For example: X.Bb Xstring array keys; Xswitch( main_newsgroup ) { X case #rec.humor.funny: X parse keys = keywords, "S,"; X accept if laugh in keys; X break; X default: X parse keys = "keywords, " "; X if( keys has "^foo" ) X adjust 20; X break; X } X.Be X X.H 2 "Feeding Sites" X.P XIf you use NewsClip's \fIbatch\fP mode to feed other sites (or users) Xfrom a \fB.newsrc\fP file, you must be sure to include the group X``control'' in the list of subscribed groups. This will pass control Xmessages (cancellations of articles etc.) to your feed site. X.P XWhile it should usually do little harm to pass all control messages, you Xmay wish to filter them further. The ``control'' group is unusual, in Xthat the groups on the \fBNewsgroups:\fP line will not include \fBcontrol\fP, Xbut will rather be the groups to which the control message applies. X.P XYou may wish to forward control messages only if they include a group you Xalready subscribe to. The \fBnewsrc\_group\fP function tells you if a group Xwas one of those listed in the \fB.newsrc\fP file. You may also wish Xto include hierarchies of control messages to catch new group creation Xmessages. You may wish to filter out boring ``ihave/sendme'' protocol Xcontrol messages by looking at the control line. X.P XNewsgroup creation messages get posted to the special pseduo-group, X``\fIgroupname\fP.ctl.'' Thus the creation message for ``comp.misc'' Xwas ``posted'' to ``comp.misc.ctl'' -- watch for that. Special control Xmessages may also be posted to fake groups that end in ``.ctl.'' This Xmeans you may wish to use pattern matching on your newsgroup names instead Xof the usual exact match schemes. X.P XIf you catch a creation message that you want to propagate, you may also Xwish to add the created group to your \fB.newsrc\fP file. Use the X\fBsubscribe\fP procedure to do this. X.P XFeeding with a \fB.newsrc\fP has some powerful advantages. For example, Xit's easy to have a complex subscription list. You can even combine together Xall the \fB.newsrc\fP files from the remote site, add ``control'' and build Xa file that only sends what is actually read. X X.H 2 "Examples" X.P XHere are some examples of how to code for common actions. Some of these Xexamples are conditional expressions, which you can then use in \fBif\fP, X\fBreject if\fP or \fBaccept if\fP statements, as desired. In most Xcases, these examples are code fragments, and not complete programs. It Xis assumed that they exist within larger programs. (For example it's Xpointless to have a program that just does \fBaccept if\fP, as \fBaccept\fP Xis the default action. X X.H 3 "My Own Articles" X.P XTo see your own articles and all followups to them: X.Bb Xdatabase myarticles; Xextern string message\_id; Xextern userid from; Xextern string array references; Xprocedure init() X{ X myarticles = read\_database( "~./News/myarts" ); X} Xprocedure article() X{ X extern string my\_mail\_address; X if( from == my\_mail\_address ) { X myarticles[message\_id] = true; X accept; X } X if( references != nilarray && references in myarticles ) X accept; X /* more code */ X} Xprocedure terminate() X{ X extern datetime time\_now; X write\_database( myarticles, "~./News/myarts", time\_now - month ); X} X.Be X X.H 3 "Local Articles" X.P XShow me articles by people from my site: X.Bb Xextern userid from; X{procgap} Xextern string my\_domain; Xextern string domain( string ); Xaccept if domain( from ) == my\_domain; X.Be X.H 3 "Locally Distributed Articles" X.P XShow me articles posted for citywide distribution or smaller: X.Bb Xextern int distribution\_level; Xextern int dlevel( newsgroup ); Xaccept if distribution\_level <= dlevel(#city); X.Be X.P XYou may want to filter by distribution based on the group. In some groups Xyou might want to read the whole netwide stream, and in others you might Xwant to read only the local stream. In some groups, you might even want to Xeliminate the local stream. X.H 3 "Crossposting" X.P XAn article might be considered too heavily crossposted if X\fBcount(newsgroups) > 4\fP. On the other hand, you might decide in Xsome groups to only read articles unique to the group with: X.Bb Xcase #news.admin: X reject if count(newsgroups) > 1; X break; X.Be X.P XYou might want to be a bit more lenient than that. The following code: X.Bb Xextern newsgroup main_newsgroup; Xreject if main_newsgroup != newsgroups[0]; X.Be Xrejects articles where the primary newsgroup isn't the one you Xare currently processing. This means messages that were posted to your Xgroup as a possible afterthought. You might wish to give them a lower Xscore or reject them out of hand. Of course, if you do subscribe to Xthe primary newsgroup (first on the \fBnewsgroups\fP list), then you Xwill still see the article in that group. If you don't subscribe, you Xwon't see it at all. X.H 3 "Eliminating a User" X.P XYou can eliminate a list of users from ``your'' net, so that you don't Xsee their articles, and you don't even see followups to their articles. X.Bb Xdatabase badusers; Xdatabaes badarticles; Xextern string message\_id; Xextern userid from; Xextern string array references; Xprocedure init() X{ X badusers = read\_database( "~./News/badusers" ); X badarticles = read\_database( "~./News/badarts" ); X} Xprocedure article() X{ X /* does it come from a nasty user? Mark it */ X if( from in badusers ) { X badarticles[message\_id] = true; X reject; X } X reject if references != nilarray && references in badarticles; X /* more code */ X} Xprocedure terminate() X{ X extern datetime time\_now; X write\_database( badarticles, "~./News/badarts", time\_now - month ); X} X.Be X.H 4 "\fIReally\fP Eliminating a User" X.P XThere are still many sites out there that don't build proper X\fBreferences\fP chains on their articles. To really eliminate followups Xto an article, you have to do more than add the message id to a database of Xbad messages. If the article is an original, with no ``Re:'' at the Xfront of the subject, you should also add the subject line to a Xdatabase of bad subjects. X.P XAnd if you want to get really fancy, you could have your program search Xarticle bodies for mentions of the user's name. X.H 4 "If you Eliminate a User" X.P XIf you decide that you would be better of eliminating the postings of Xa USENET user, it would be a good idea to send a brief mail note to this Xuser indicating that you have done so, possibly including the reason Xwhy. X.P XSome users who make annoying mistakes on USENET may not realize that Xthey are making mistakes, or they may not realize the extent to which Xthey are annoying people. If they are informed that some readers have Xdecided to read no more of their writing, they may decide to change Xtheir behavior. That is up to the poster, of course. X.H 3 "Included Text & Signatures" X.P XYou may not like long rebuttal articles with lots of included text. XIn some groups, you could then include: X.Bb Xextern int lines; X{procgap} Xreject if lines > 50 && lines / line\_count( included ) < 2; X.Be Xwhich rejects long articles that are more than half included text. X.P XYou could also reject (or lower the score) of articles that are short Xand have big signatures. X.Bb Xextern int lines; X{procgap} Xreject if lines < 30 && line\_count( signature ) > 9 X.Be XTo get fancy, you could have an \fBif\fP statement add the posters of Xsuch articles to your \fBbadusers\fP database (see above) so that you Xnever hear from them again! In this case you would have to write out Xyour \fBbadusers\fP database at the end of the session. X.H 3 "Followups" X.P XIn some groups, it's better to just ignore the followups. Try X.Bb Xextern int followup; X/* big group switch */ Xcase #rec.humor: X reject if followup; X break; X.Be XYou might not be so harsh, but instead just lower the score or apply Xfurther tests before allowing followups to make it through. X.P XAnother idea is to ignore followups except in the main group on the Xnewsgroup list. Try this: X.Bb Xextern int followup; Xextern newsgroup main\_newsgroup; Xreject if followup && main\_newsgroup != newsgroups[0]; X.Be X.H 3 "Two Out of Three Ain't Bad" X.P XYou can use integer arithmetic in combination with the fact that Xconditional expressions return 1 for true and 0 for false. To accept Xan article that has 2 out of 3 keywords in the subject: X.Bb Xextern string subject; X{procgap} Xaccept if (subject has "baz") + (subject has "bar") + (subject has "foo") > 1; X.Be X.H 3 "Patterns of Groups" X.P XYou can get pretty fancy with what you do with crossposted articles. In Xfact, with the right use of NewsClip, crossposting could be a good Xthing. Say you want to only see space articles that also pertain to Xastronomy. You could either use \fBis sci.space && is sci.astro\fP in Xa general expression, or if you use a \fBswitch\fP, you could say: X.Bb Xcase #sci.astro: X reject if !is sci.space; X.Be XLikewise you could say: X.Bb Xcase #rec.humor: X reject if is talk.bizzare; X.Be Xto eliminate only the messages crossposted to that other group. No Xdoubt \fBreject if is comp.sys.atari.st && is comp.sys.amiga\fP will Xbe popular! Likewise, if people are kind enough to crosspost to X``alt.flame'', that lets you control whether you read the article or not. X.P XUse boolean logic on groups to your heart's content. X X X X.H 1 "Debug & Testing" X.P XAll programs of any complexity will have bugs, and yours will be Xno exception. Your bugs may simply cause articles to be accepted or Xrejected improperly, or they may cause your filter program to crash, Xeither through an infinite loop or an exception. X.H 2 "Segmentation Fault" X.P XThe most frustrating thing to see can be the message ``segmentation fault.'' X(Sometimes ``memory fault.'') XThis means, on Unix, that your program has tried to use memory Ximproperly. This is often the result of an attempt to reference an Xarray, string or userid that has a \fBnil\fP value. X.P XYou must remember that before you ever reference data in an array or Xstring that might not be defined, you must check that it is defined. X.P XThere is a difference between \fBnilstring\fP and the empty string X(\fB""\fP). For example, if you use the \fBsummary\fP header variable, Xit will be \fBnilstring\fP if the header wasn't there, and \fB""\fP if Xthe header was there, but the summary was blank. X.P XThe same is true for nil arrays. \fBnilarray\fP isn't the same as an Xarray with no elements. For your protection, the current release of XNewsClip has the \fBin\fP and \fBhas\fP operators treat \fBnillarray\fP Xas an empty array, but this is not guaranteed to work in future releases. X.P XWe do allow a nil database to be the same as an empty database when it Xcomes to looking in the database, but you can't use a nil database for Xstoring into -- you could get that ``segmentation fault.'' X.P XOther causes of this error include: array indices that are out of bounds, Xor a character index beyond the end of a string. X.P XAlways beware of the most common cause, which is the use of a variable Xthat has not yet been assigned a value. X X.H 2 "Debuggers" X.P XIf you can't figure out the immediate cause of a problem like this, and Xyou are a C programmer, Unix has many debugging tools available to help Xwith this sort of problem. X.P XThe C source produced by \fBncc\fP is fairly readable, and you should Xbe able to readily tell what line of the output C program corresponds Xto a statement in your NewsClip program. Use the \fI-l\fP option Xof \fBncc\fP to generate a standalone C program. You can then Xcompile and link it with the \fBnewsclip.a\fP library yourself, using Xwhatever debug options you desire. X X.H 2 "Dprintf" X.P XNewsClip contains a special procedure called \fBdprintf\fP. This acts Xjust like the \fBprintf\fP function from C, except it prints to the Xstandard error output. It takes a variable number of arguments, from X1 to 5. These can be strings, ints or dates. See the man page for X\fBprintf\fP for full details. X.P XInsert debugging print statements in your programs so you can figure out Xwhat's going on and what values are being assigned to variables. X.P XPlease note that you can't print variables of type \fBnewsgroup\fP Xor \fBuserid\fP. Assign such values to strings first. Alternatively, you Xcan print newsgroups with the ``%d'' code, which will give the newsgroup Xnumber. X X.H 2 "Warning Level" X.P XYou can set the warning level for your NewsClip programs with the X\fIwarning=num\fP option. Provide a number. The higher the number, Xthe more warnings you get. The default level is 1, and currently Xwarnings exist at levels 0 through 4. Select a high number like 100 Xto get all warnings. X.P XYou will be warned about conditions that are normally considered OK, Xsuch as the reading of a non-existent database file, but you may also Xlearn some useful debugging information. X X.H 2 "Trial Runs" X.P XTo test and debug your programs, use the \fIfilter\fP or \fIlist\fP modes Xof operation. We suggest \fIfilter\fP for preliminary testing. X.P XTo do this, prepare a list of article filenames, either with articles Xmade up by you or live articles on your system. Use absolute pathnames Xif possible. Start perhaps with only one article in the list. Run: X.Bb Xnclip m=filter <list X.Be Xon an \fBnclip\fP program that is full of \fBdprintf\fP statements. You Xshould quickly be able to see what's happening as your program runs, and Xfind out how to fix it. X.P XIf you start working on larger lists of files, include a statement like: X.Bb Xextern string article\_filename; Xdprintf( "%s\\n", article\_filename ); X.Be Xat the start of your \fBarticle\fP procedure, so that you know what article Xyour program is working on when it goes wrong. X.P XThis variable is also a good one to look at if you're using a debugger. X.P XLater, you may be ready for the \fIlist\fP mode or even the \fInewsrc\fP mode, Xperhaps in combination the \fInewsrc=\fP option. We suggest the latter Xoption, as you should not run test programs on your real-live \fB.newsrc\fP Xfile. X.P XEven if your machine does not keep news around, and you only use XNewsClip programs in combination with newsreaders that know how to Xtalk to them, you can still make up your own sample articles (or have Xyour newsreader save out article files on your own machine) and run tests Xon them as you please. X X.H 2 "Debugging in Pipe Mode" X.P XIt can be very difficult to debug a problem that only develops when your Xprogram runs in communication with a newsreader. The only indication you Xmay get of a problem is that the news filter process stops running, and commands Xto it fail to work. A good news reader will inform you that the filter has Xdied, but this may go by so quickly on the screen you can't spot it. X.P XTo debug in filter mode, you must set the environment variable \fBNCLIPDEBUG\fP Xto ``truepipe''. Then run a newsreading session. This will leave two Xfiles in your ``dot'' directory (the directory where the \fB.newsrc\fP is) Xnamed \fBinpipe\fP and \fBpipelog\fP. The \fBpipelog\fP file is Xan expanded log of the news filter's discussions with the newsreader. XLook at the end of it to see what your filter was doing when it died. X.P XThe \fBinpipe\fP file is the most important for duplicating the problem. XCopy it somewhere safe, as future sessions might overwrite it. Say you Xput it in \fBbadbug\fP; start up your debugger on your \fBnclip\fP program Xand run it with the arguments: X.Bb Xmode=pipe <badbug X.Be XThis should cause your reading session to be duplicated, so long as no Xnews that you processed has since expired. You will see the news filter's Xpipe responses to the newsreader printed on the standard output. Most Ximportantly, your filter program will now fail inside the debugger, where Xyou can track down what is going on. END_OF_FILE if test 50212 -ne `wc -c <'doc/man.mm.3'`; then echo shar: \"'doc/man.mm.3'\" unpacked with wrong size! fi # end of 'doc/man.mm.3' fi echo shar: End of archive 13 \(of 15\). cp /dev/null ark13isdone MISSING="" for I in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ; do if test ! -f ark${I}isdone ; then MISSING="${MISSING} ${I}" fi done if test "${MISSING}" = "" ; then echo You have unpacked all 15 archives. rm -f ark[1-9]isdone ark[1-9][0-9]isdone else echo You still need to unpack the following archives: echo " " ${MISSING} fi ## End of shell archive. exit 0