[news.software.b] back-end inews for NNTP and C news

lamy@ai.utoronto.ca (Jean-Francois Lamy) (09/03/89)

This news site support posting via mail and nntp; for that reason inews was
split in two conceptually different parts: a front end for use on client
machines and a back-end on the server.

The front-end does very little.  On most machines we simply use the nntp
fake inews with slight hacks to do workstation name hiding.  On others
articles are mail to the server because they don't run nntp.

The back end handles the moderated newsgroups goo and actually does the
posting.  It expects an article on stdin, and does not parse any switches.
It does not deal with signatures (because the article may come from a remote
machine whose file systems aren't visible).  It trusts headers and will
generate missing message ids (because the nntp inews does not include those
and relaynews ignores messages with no message ids).

I have massively reorganized the way inews searches for moderators (you can
now sort the moderators list and use look(1) to grope around if you feel like
it). gngp is not used.  I have merged back all the censoring scripts back into
inews.  It invokes awk and sed once on each message (compare with the official
inews :-), and does not use shell constructs that cause forks (if then else
into a pipe, etc.).  And instead of calling gngp a few hundred times on a
moderated newsgroups (1/2 the size of your mailpaths file, this code will
usually do 1 grep or look).

As it is now, this code will support cross-posting between newsgroups and
moderated newsgroups (we support this feature locally because we've had a need
for it).  A primitive attempt to prevent this is in the code and can be
commented out.

This back end is invoked on the news server, and only on the news server, by
the nntp daemon and the sendmail aliases.  The back end is included at the end
of the message; the front-ends are unpostable because of the local religious
beliefs they enforce (none of our front-ends include signatures, we believe
that's a job for the news reader posting interface, etc.).  This code has been
running at this site for a fair while, but as we say in this business, you're
on your own, and your kilometreage (sic) may vary.

Jean-Francois Lamy               lamy@ai.utoronto.ca, uunet!ai.utoronto.ca!lamy
AI Group, Department of Computer Science, University of Toronto, Canada M5S 1A4
-------------------------------------------------------------------------------
#! /bin/sh
#
# server.inews 
#
#   Back-end inews for C news.
#
#   Post article read on stdin.  Arguments are ignored.
#   Meant to be invoked by an NNTP server or by a sendmail/zmailer alias (e.g.
#   feednews: "|/news/bin/server.inews" ).  Mail headers inappropriate for
#   news will be stripped/transformed appropriately.
#
#   The message is mailed to moderators of moderated newsgroups mentioned on
#   the Newsgroups: line, unless a proper Approved: line is seen, in which
#   case we post directly.  We post to non-moderated newsgroups, and deal with
#   Control: messages appropriately.
#
#   We assume that the From: line can be trusted.  A Path: line will be
#   trusted if present.  If not, one will be made up from mail information
#   if available, and used unless posting an approved message to a moderated
#   newsgroup (since that would prevent the posting site from seeing it).
#   The path data will also set the error-return address of the message,
#   for better or for worse.
#
# remotely derived from code by Geoff Collyer, C news alpha release
# heavily(!) reworked for CSRI/ANT environment by Jean-Francois Lamy
#	(lamy@ai.utoronto.ca)

# =()<. ${NEWSCONFIG-@<NEWSCONFIG>@}>()=
. ${NEWSCONFIG-/news/bin/config}
NEWSCTL=/news/lib

PATH=/local/bin:$NEWSCTL/bin:$NEWSBIN:$NEWSBIN/relay:$NEWSPATH
export PATH
umask $NEWSUMASK

# fix according to your site.
hostname=`hostname`		# the usenet name of your site
host="$hostname.toronto.edu"    # if $hostname is not absolute
date=`date`
defdate="`set \`date -u\`; echo $3 $2 \`echo $6 | sed 's/^..//'\` $4 $5`"
defmsgid="`set $date; echo \<$6$2$3.\`echo $4 | tr -d :\`.$$@$host\>`"

input=/tmp/in$$in      	# uncensored input
inhdrs=/tmp/in$$hdr	# headers after censoring
inbody=/tmp/in$$body	# body after message
censart=/tmp/in$$cens   # article with censored headers
nglist=/tmp/in$$act	# active file entries for newsgroups
indefs=/tmp/in$$defs	# shell variable definitions, extracted from message.
logfile=/news/log/inews.log
rmlist='/tmp/in'$$'*'

# programs
relaynews="relaynews -r"
sendmail='/usr/lib/sendmail -t'	# articles plainly mailed to moderators
grep=ngrep
modgrep=ngrep	# use look if $NEWSCTL/mailpaths file is kept sorted.

trap 'rm -f $rmlist; echo '.' >>$logfile' 0 1 2 3 15  # don't leave a mess.
echo -n $date \[$$\] \(`whoami`\) >> $logfile

# capture incoming news in case inews fails
   if cat >>$input; then
      : got it.
   else
      echo "$0: lost news; cat returned status $?" >&2
      exit 1
   fi

echo -n "; " >>$logfile

# Separate header from body, mangling headers as we go.
awk '
    # pass 1 - note presence | absence of certain headers, store body.
    BEGIN		{ body = 0 ; skipping = 0 ; received = 0 }
    body == 1		{ print > "'$inbody'" ; next }
    /^[ \t]*$/		{ body = 1 ; print > "'$inbody'" ; next }

    # don-t bother with various mail headers
    /^To:|^X-To:|^Cc:|^Apparently-To:|^Original-To:/ {skipping = 1 ; next }
    /^Resent-*:/	{ skipping = 1; next }
    # Normally only one of the following would be present (Path: if coming via
    # NNTP, From_ via mail.  Return-Path is unlikely).
    /^From /		{ hdrval["Path:"] = $2 ; next }
    /^Return-Path:/	{ hdrval["Path:"] = $2 ; next }
    /^Path:/		{ hdrval["Path:"] = $2 ; next }
    /^Message-Id:/	{ $1 = "Message-ID:" }

    # a header keyword: remember it and its value.
    /^[^\t ]*:/ { skipping = 0; received = 0 ; keyword=$1 ;
		  if ( hdrval[$1] == "" )
		      hdrval[$1] = $0
		  else # some headers can be repeated
		      hdrval[keyword] = hdrval[keyword] "\n" $0 }
    # a continuation: concatenate this line to the value
    !/^[^\t ]*:/ { if ( ! skipping )
			hdrval[keyword] = hdrval[keyword] "\n" $0 }
    # pass 2 - deduce & omit & emit headers
    END {
	    subj = "Subject:";		path = "Path:"         
	    ctl = "Control:";		date = "Date:"         
	    ng = "Newsgroups:";		from = "From:"         
	    msgid = "Message-ID:";	org = "Organization:"  
	    typo =  "Message-Id:";	distr = "Distribution:"
	    rcvd = "Received:";		appr = "Approved:"     

	    # fill-in some headers
	    if (hdrval[msgid] == "")
		    hdrval[msgid] = msgid " " "'"$defmsgid"'"
	    # force GMT format
	    hdrval[date] = date " " "'"$defdate"'"

	    # nuke others
	    distworld = distr " world"
	    if (hdrval[distr] == distworld)
		    hdrval[distr] = ""

	    # turn Subject: cmsg into a proper Control: header.
	    if (substr(hdrval[subj],1,14) == "Subject: cmsg ") {
		hdrval[ctl] = ctl " " substr(hdrval[subj],15)
		hdrval[subj] = substr(hdrval[subj],1,9) substr(hdrval[subj],15)
	    }

	    # warn if no Newsgroups:
	    if (hdrval[ng] == "")
		    print "'$0': no Newsgroups: header." | "cat >&2"
	    else {
		    print "allngs=" q hdrval[ng] q > "'$indefs'"
		    hdrval[ng] = ""
	    }

	    # write out various header values so the shell can read them
	    if (hdrval[ctl] != "") {
		    print "ctrl=" q hdrval[ctl] q > "'$indefs'"
		    hdrval[ctl] = ""
	    }
	    if (hdrval[path] != "") {
		    print "path=" q hdrval[path] q > "'$indefs'"
		    hdrval[path] = ""
	    }
	    if (hdrval[appr] != "") {
		    print "appr=" q hdrval[appr] q > "'$indefs'"
	    }
	    if (hdrval[rcvd] != "") {
		    print "rcvd=" q hdrval[rcvd] q > "'$indefs'"
		    hdrval[rcvd] = ""
	    }

	    # Output headers common to news and mail, starting with From:
	    if (hdrval[from] != "") {
		    print hdrval[from]
		    hdrval[from] = ""
	    }
	    # have pity on readers: put Subject: next
	    if (hdrval[subj] != "") {
		    print hdrval[subj]
		    hdrval[subj] = ""
	    }
	    # print misc. headers in random order, unless they are empty.
	    for (i in hdrval)
		    if (hdrval[i] != "" && hdrval[i] != i " ")
			    print hdrval[i]
    }
    ' q=\' $input > $inhdrs

# swallow what we've just learned in the awk script
. $indefs
echo -n $path: $allngs >> $logfile

# produce list of newsgroups for validation purposes
if test "$ctrl" ; then
	echo "control" > $nglist	# a dreadful hack around all.all.ctl
else
        # keep only newsgroups in active; something else ought to have
        # complained about that by now.
	egrep "^(` echo $allngs | sed		\
	       -e 's/^Newsgroups:[ 	]*//' 	\
	       -e 's/ //'			\
	       -e 's/\./\\\\./g' 		\
	       -e 's/+/\\\\+/g'			\
               -e 's/,/ |/g'			\
               -e 's/$/ /'`)" $NEWSCTL/active >$nglist
fi

# sift through newsgroups, gathering moderators
# Whether cross-posting to several mailing lists is allowed
# is a matter of local policy; uncomment code near end of loop to prevent.
exec < $nglist
while read ng high low flag ; do
    ngbak=$ng
    case "$flag" in
    y)
        newsgroups=${newsgroups+$newsgroups,}$ng
        ;;
    m)
        if test "$appr" ; then
            # just post normally
            newsgroups=${newsgroups+$newsgroups,}$ng
        else 
            # no Approved: add moderator to list of mail recipients
            while true ; do
                r=`$modgrep $ng $NEWSCTL/mailpaths`
                if test "$r" ; then
                    set $r
                    moderator=$2
                    break
                elsif test "$ng" != "internet"
			# people should just have "all %s@sitename", but since
			# the docs posted to news.admin talk about backbone and
			# internet, we suffer through that too.
                        ng=`echo $ng|sed -e 's/.*\..*/&all/' \
					 -e 's/^all$/backbone/' \
					 -e 's/^backbone$/internet/'`
		else
		        echo "$0: no moderator found for '$ng'" >&2
	                exit 64 # bad usage message.
                fi
	    done

	    # field the %s feature
	    ngbak=`echo $ngbak | tr '.' '-'`
	    moderator=`echo $moderator | sed -e "s/%s/$ngbak/"`
            modroute=${modroute+$modroute,}$moderator

	    # enforce restricted usage (mail to first moderator, don't post
	    # article to normal newsgroups). Uncomment next two lines to use.
	    # newsgroups=""
	    # break
        fi
	;;
    n|*)
        if test "$ctrl" ; then
	    newsgroups=`echo $allngs|sed -e 's/Newsgroups:[ 	]*//'`
	else
            echo "$0: Posting to $ng is not allowed." >&2
            exit 64   # sendmail/Zmailer "bad usage" message
        fi
        ;;
    esac
done

if [ "$newsgroups" ] ; then
    # cook-up the news-specific headers.  Path: is omitted if we are
    # posting to a moderated newsgroup, as this would prevent the site
    # it was mailed from from seeing the message in the newsgroup.
    if test "$appr" ; then
        path=""
    fi
    cat - $inhdrs <<EOF |
${ctrl}${ctrl:+
}${path:+Path: }${path:+$path}${path:+
}Newsgroups: $newsgroups
EOF
    # final header sanitization (no route format in From:, no tabs after :)
    # append body, strip invisible chars.  Strip our host name from path, else
    # articles posted here will not get shown.
    sed -e '/^Path: /s/.*'$hostname'[^!]*!//' \
        -e 's/^From:[ 	]*\(.*\)  *<\(.*\)>/From: \2 \(\1\)/' \
	-e 's/:[ 	]*/: /' \
	-e "\$r $inbody" |
    tr -d '\1-\7\13\14\16-\37' >$censart

    if $relaynews <$censart
    then
       echo -n " relaynews ok" >> $logfile
    else
       status=$?
       echo "$0: article could not be posted (relaynews status $status)" >&2
       echo "$0: failed news in `hostname`:$input " >&2
       echo " relaynews status $status" >> $logfile
       exit $status
    fi
fi

if [ "$modroute" ] ; then
    # mail article to the moderator(s).
    cat - $inhdrs <<EOF |
${rcvd:+$rcvd}${rcvd:+
}To: ${modroute}
EOF
    # final header sanitization (no route format in From:, no tabs after :)
    # strip message id so cross-postings between a newsgroup and a gatewayed
    # mailing lists show up in both places
    # append body, strip invisible chars, push to mailer.
    sed -e 's/^From:[ 	]*\(.*\)  *<\(.*\)>/From: \2 \(\1\)/' \
	-e 's/:[ 	]*/: /' \
	-e '/^Message-ID:/d' \
	-e "\$r $inbody" |
    tr -d '\1-\7\13\14\16-\37' >$censart
    $sendmail -f$path <$censart
    echo -n " resent to '$modroute'" >>$logfile
fi