[news.software.b] newshist problem?

nash@ucselx.sdsu.edu (Ron Nash) (05/31/91)

I seem to have a small problem with nstats.  I get the message:

Usage: newshist [-f historyfile] messageid ...

whenever I run nstats on my log file since running addmissing.  I have 
been running C News patched thru 24-Mar-1991 for a while with no 
problems until I ran addmissing.  I am using DBZ and NNTP1.5.10
on a BSD 4.3 system (Elxsi 6400).  Expire seems to be working OK.
newshist is complaining at this point:

case "$1" in
-*)     echo "case b $1 $usage" >&2 ; exit 2    ;;

with "$1" being a "--".  

I am running Nstats (a perl script) Version 1.2  (10/17/89). 

Any ideals?   Nstats does print its report and it looks reasonable.


-- 
Ron Nash 	San Diego State University
Internet:  	nash@ucselx.sdsu.edu
Gin-N-Tonic	5 year old 1/2 Arab endurance prospect
Luv on Fire	8 year old Arab, trusty steed and friend

mark@comp.vuw.ac.nz (Mark Davies) (05/31/91)

In article <1991May30.194402.24916@ucselx.sdsu.edu>, nash@ucselx.sdsu.edu (Ron Nash) writes:
|> I seem to have a small problem with nstats.  I get the message:

|> Usage: newshist [-f historyfile] messageid ...

|> whenever I run nstats on my log file since running addmissing.  I have 
|> been running C News patched thru 24-Mar-1991 for a while with no 
|> problems until I ran addmissing.  I am using DBZ and NNTP1.5.10
|> on a BSD 4.3 system (Elxsi 6400).  Expire seems to be working OK.
|> newshist is complaining at this point:

|> case "$1" in
|> -*)     echo "case b $1 $usage" >&2 ; exit 2    ;;
|> 
|> with "$1" being a "--".  

|> I am running Nstats (a perl script) Version 1.2  (10/17/89). 

|> Any ideals?   Nstats does print its report and it looks reasonable.

In the latest C News patches newshist changed to a shell script and stopped
supporting the -- to signify end of arguments (from getopt).  If you look
in nstat where it invokes newshist you can simply take out the -- and
things will work, however be warned performance will be terrible as you
will have a separate invocation of dbz for each message-id in your log
file.

To get better performance I made the following change to nstat.  Rather
that use newshist I now use a little helper program called nhistls.  I
don't have the original nstats here any more so I can't give you diffs but
the whole thing is pretty small so I will include it here:

[Sidenote: This is the script that I run that notified me of the articles
with bogus dates coming from mccall.com (and other sites).  Of course given
the amount of complaining tp@mccall.com has done since, perhaps I should
never have notified him ;-)]

cheers
mark

--------start of nhistls----------------
#!/bin/sh
for mid
do
	echo "<$mid>"
done | /usr/lib/newsbin/dbz -ix /usr/lib/news/history
--------end of nhistls----------------
--------start of nstats----------------
#!/usr/local/bin/perl
#
# Nstats - Print C news statistics via Perl
#
# Version 1.2  (10/17/89)
#
#
#
# Author's notes:
#
# Constructive comments and enhancements are solicited (flames are not).
# Please send suggestions or enhancements to denny@mcmi.
#
# Larry Wall has a Very Nice Work in Perl.  Many thanks to him.
#
# Denny Page, 1989
#
#
#
# Program notes:
#
# The simplest usage is 'perl nstats ~news/log'.  I leave you to find
# more complicated invocations.
#
# While a duplicate is actually a rejected message, it is treated
# separately here.  Rejected messages herein are messages that are not
# subscribed to in the sys file or are excluded in the active file.
#
# Junked messages are not displayed in the system summaries.  It's not
# your neighbor's fault that you are missing active file entries.  If
# you are concerned about receiving junk groups, exclude them in your
# sys or active file.  They will then be summarized :-).
#
# The reason for a newsgroup being bad is assigned only once.  If the
# reason changes later in the log (such as the sys file being modified
# such that a newsgroup is no longer rejected, but rather is filed in
# junk), no notice will be taken.
#
# Calls to newshist are cached at 25.  This may need to be adjusted at
# some sites.
#
# Sitenames are truncated to 15 characters.  This could be done better.
#
#
# Output headers have the following meanings:
#
#   System	Name of the neighboring system.
#   Accept	Number of accepted articles from system.
#   Dup		Number of duplicate articles received from system.
#   Rej		Number of rejected articles from system.
#   Sent	Number of articles sent to system.
#   Sys%	Accepted (or duplicate or rejected) articles as a
#		percentage of total articles from that system.
#   Tot%	Accepted (or duplicate) articles as a percentage
#		of total accepted (or duplicate) articles.
#   Avl%	Number of articles sent as a percentage of total
#		available (accepted) articles.
#
############################################################
#
# Revision history:
#
# 09/24/89	dny	Initial version
# 09/28/89	dny	Added category totals
# 10/02/89	dny	Fixed link count bug in record_groups
# 10/03/89	dny	Cleaned up variable names
# 10/16/89	dny	Renamed variables - Perl 3.0
# 10/17/89	dny	Fixed bug in rejection counts
# 04/18/91	mark@comp.vuw.ac.nz 
#			speedups replacing newshist now
#			that it is a shell script
#
############################################################


################ ***** Change this ***** ###################
#
$newshist="/usr/lib/newsbin/acct/nhistls";
#
############################################################


# Record the category of a list of message-ids
sub record_groups {
    open(newshist, "-|") || exec $newshist, @_;

    $batchcnt = $#_ + 1;
    while (<newshist>) {
	if (s/^.+\t.+\t(.+)\n$/$1/) {
	    $batchcnt--;
	    foreach $link (split(/ /)) {
		$link =~ s/^([^\.\/]+).*/$1/;
		$category{$link}++;
	    }
	}
    }
    $category{"*expired*"} += $batchcnt;
    close(newshist);
}

############################################################

$#id_cache = -1;

while (<>) {
    ($from, $action, $message_id, $text) =
        /^.+\s(\S+)\s(.)\s<(.+)>\s(.*)$/;
    $from = substr($from, 0, 15);

# Accepted message
    if ($action eq '+') {
	$accepted{$from}++;
        foreach $site (split(/ /, $text)) {
	    $site = substr($site, 0, 15);
	    $sent{$site}++;
	}

	$id_cache[++$#id_cache] = $message_id;
	unless ($#id_cache < 50) {
	    do record_groups(@id_cache);
	    $#id_cache = -1;
	}
	next;
    }
    elsif ($action eq '-') {
# Duplicate
	if ($text eq 'duplicate') {
	    $duplicates{$from}++;
	    next;
	}
	$rejected{$from}++;
# Group not in sys
	if ($text =~ s/no subscribed groups in `(.+)'/$1/) {
            foreach $group (split(/,/, $text)) {
		if ($badgroup{$group}++ == 0) {
		    $badgroup_reason{$group} = "not subscribed in sys";
		}
	    }
	    next;
	}
# Group excluded in active
	elsif ($text =~ s/all groups `(.+)' excluded in active/$1/) {
            foreach $group (split(/,/, $text)) {
		if ($badgroup{$group}++ == 0) {
		    $badgroup_reason{$group} = "excluded in active";
		}
	    }
	    next;
	}
    }
# Junked message
    elsif ($action eq 'j') {
	$junk{$from}++;
	if ($text =~ s/junked due to groups `(.+)'/$1/) {
            foreach $group (split(/,/, $text)) {
		if ($badgroup{$group}++ == 0) {
		    $badgroup_reason{$group} = "not in active (junked)";
		}
	    }
	    next;
	}
    }
# Ignore ihave/sendme messages
    elsif ($action eq 'i') {next;}
    elsif ($action eq 's') {next;}

# Unknown input line
    print $_;
}


if ($#id_cache >= 0) {
    do record_groups(@id_cache);
}


# Collect all sitenames and calc totals
foreach $system (keys(accepted)) {
    $systems{$system} = 1;
    $total_accepted += $accepted{$system};
}
foreach $system (keys(duplicates)) {
    $systems{$system} = 1;
    $total_duplicates += $duplicates{$system};
}
foreach $system (keys(rejected)) {
    $systems{$system} = 1;
    $total_rejected += $rejected{$system};
}
foreach $system (keys(sent)) {
    $systems{$system} = 1;
    $total_sent += $sent{$system};
}
$total_articles = $total_accepted + $total_duplicates + $total_rejected;



# Print system summaries
print "\nSystem             Accept sys% tot%    Dup sys% tot%    Rej sys%     Sent avl%\n";

foreach $system (sort keys(systems)) {
    $articles = $accepted{$system} + $duplicates{$system} + $rejected{$system};

    if ($accepted{$system} > 0) {
	$accepted_pct = ($accepted{$system} * 100) / $articles + 0.5;
	$accepted_totpct = ($accepted{$system} * 100) / $total_accepted + 0.5;
    }
    else {
	$accepted_pct = 0;
	$accepted_totpct = 0;
    }
    if ($duplicates{$system} > 0) {
	$duplicates_pct = ($duplicates{$system} * 100) / $articles + 0.5;
	$duplicates_totpct = ($duplicates{$system} * 100) / $total_duplicates + 0.5;
    }
    else {
	$duplicates_pct = 0;
	$duplicates_totpct = 0;
    }
    if ($rejected{$system} > 0) {
	$rejected_pct = ($rejected{$system} * 100) / $articles + 0.5;
    }
    else {
	$rejected_pct = 0;
    }
    if ($sent{$system} > 0) {
	$sent_pct = ($sent{$system} * 100) / $total_accepted + 0.5;
    }
    else {
	$sent_pct = 0;
    }

    printf "%-15s     %5d %3d%% %3d%%   %4d %3d%% %3d%%   %4d %3d%%    %5d %3d%%\n",
	$system,
	$accepted{$system}, $accepted_pct, $accepted_totpct,
	$duplicates{$system}, $duplicates_pct, $duplicates_totpct,
	$rejected{$system}, $rejected_pct,
	$sent{$system}, $sent_pct;
}


if ($total_accepted > 0) {
    $accepted_pct = ($total_accepted * 100) / $total_articles + 0.5;
}
else {
    $accepted_pct = 0;
}
if ($total_rejected > 0) {
    $rejected_pct = ($total_rejected * 100) / $total_articles + 0.5;
}
else {
    $rejected_pct = 0;
}
if ($total_duplicates > 0) {
    $duplicates_pct = ($total_duplicates * 100) / $total_articles + 0.5;
}
else {
    $duplicates_pct = 0;
}

printf "TOTALS              %5d %3d%%        %4d %3d%%        %4d %3d%%    %5d\n",
$total_accepted, $accepted_pct,
$total_duplicates, $duplicates_pct,
$total_rejected, $rejected_pct,
$total_sent;



# Display any bad newsgroups received
@keys = sort(keys(badgroup));
if ($#keys >= 0) {
    print "\n\nBad Newsgroups                    Articles    Reason\n";
    foreach $group (@keys) {
	printf "%-35s   %4d    %s\n",
	    $group, $badgroup{$group}, $badgroup_reason{$group};
    }
}


# Display news categories received
@keys = sort(keys(category));
if ($#keys >= 0) {
    print "\n\nCategories Received               Articles\n";
    foreach $group (@keys) {
	printf "%-35s   %4d\n",
	    $group, $category{$group};
    }
}
--------end of nstats----------------

merlyn@iWarp.intel.com (Randal L. Schwartz) (06/02/91)

In article <1991May31.000943.10583@comp.vuw.ac.nz>, mark@comp (Mark Davies) writes:
| In the latest C News patches newshist changed to a shell script and stopped
| supporting the -- to signify end of arguments (from getopt).  If you look
| in nstat where it invokes newshist you can simply take out the -- and
| things will work, however be warned performance will be terrible as you
| will have a separate invocation of dbz for each message-id in your log
| file.
| 
| To get better performance I made the following change to nstat.  Rather
| that use newshist I now use a little helper program called nhistls.  I
| don't have the original nstats here any more so I can't give you diffs but
| the whole thing is pretty small so I will include it here:

Gack.  Starting a sh that did a bunch of echoes seemed a little
wasted, so I did it in Perl.  (The whole thing shows a zillion
syscalls with trace, so there's bound to be a few more optimizations.)

Here's my tweak...

################################################## snip snip
#!/usr/bin/perl
#
# Nstats - Print C news statistics via Perl
#
# Version 1.2  (10/17/89)
#
#
#
# Author's notes:
#
# Constructive comments and enhancements are solicited (flames are not).
# Please send suggestions or enhancements to denny@mcmi.
#
# Larry Wall has a Very Nice Work in Perl.  Many thanks to him.
#
# Denny Page, 1989
#
#
#
# Program notes:
#
# The simplest usage is 'perl nstats ~news/log'.  I leave you to find
# more complicated invocations.
#
# While a duplicate is actually a rejected message, it is treated
# separately here.  Rejected messages herein are messages that are not
# subscribed to in the sys file or are excluded in the active file.
#
# Junked messages are not displayed in the system summaries.  It's not
# your neighbor's fault that you are missing active file entries.  If
# you are concerned about receiving junk groups, exclude them in your
# sys or active file.  They will then be summarized :-).
#
# The reason for a newsgroup being bad is assigned only once.  If the
# reason changes later in the log (such as the sys file being modified
# such that a newsgroup is no longer rejected, but rather is filed in
# junk), no notice will be taken.
#
# Calls to newshist are cached at 25.  This may need to be adjusted at
# some sites.
#
# Sitenames are truncated to 15 characters.  This could be done better.
#
#
# Output headers have the following meanings:
#
#   System	Name of the neighboring system.
#   Accept	Number of accepted articles from system.
#   Dup		Number of duplicate articles received from system.
#   Rej		Number of rejected articles from system.
#   Sent	Number of articles sent to system.
#   Sys%	Accepted (or duplicate or rejected) articles as a
#		percentage of total articles from that system.
#   Tot%	Accepted (or duplicate) articles as a percentage
#		of total accepted (or duplicate) articles.
#   Avl%	Number of articles sent as a percentage of total
#		available (accepted) articles.
#
############################################################
#
# Revision history:
#
# 09/24/89	dny	Initial version
# 09/28/89	dny	Added category totals
# 10/02/89	dny	Fixed link count bug in record_groups
# 10/03/89	dny	Cleaned up variable names
# 10/16/89	dny	Renamed variables - Perl 3.0
# 10/17/89	dny	Fixed bug in rejection counts
# 04/18/91	mark@comp.vuw.ac.nz 
#			speedups replacing newshist now
#			that it is a shell script
# 06/01/91      merlyn@iWarp.intel.com
#                       replaced mark's shell script with Perl code
#
############################################################

############################################################


# Record the category of a list of message-ids
sub record_groups {
    local(@ids) = @_;

    grep(!/</ && s/.*/<$&>/, @ids);
    local($ids) = join("\n",@ids);
    open(newshist, "-|") || exec <<"PERL_EOF";
/usr/lib/newsbin/dbz -ix /usr/lib/news/history <<SH_EOF
$ids
SH_EOF
PERL_EOF

    $batchcnt = $#_ + 1;
    while (<newshist>) {
	if (s/^.+\t.+\t(.+)\n$/$1/) {
	    $batchcnt--;
	    foreach $link (split(/ /)) {
		$link =~ s/^([^\.\/]+).*/$1/;
		$category{$link}++;
	    }
	}
    }
    $category{"*expired*"} += $batchcnt;
    close(newshist);
}

############################################################

$#id_cache = -1;

while (<>) {
    ($from, $action, $message_id, $text) =
        /^.+\s(\S+)\s(.)\s<(.+)>\s(.*)$/;
    $from = substr($from, 0, 15);

# Accepted message
    if ($action eq '+') {
	$accepted{$from}++;
        foreach $site (split(/ /, $text)) {
	    $site = substr($site, 0, 15);
	    $sent{$site}++;
	}

	$id_cache[++$#id_cache] = $message_id;
	unless ($#id_cache < 50) {
	    do record_groups(@id_cache);
	    $#id_cache = -1;
	}
	next;
    }
    elsif ($action eq '-') {
# Duplicate
	if ($text eq 'duplicate') {
	    $duplicates{$from}++;
	    next;
	}
	$rejected{$from}++;
# Group not in sys
	if ($text =~ s/no subscribed groups in `(.+)'/$1/) {
            foreach $group (split(/,/, $text)) {
		if ($badgroup{$group}++ == 0) {
		    $badgroup_reason{$group} = "not subscribed in sys";
		}
	    }
	    next;
	}
# Group excluded in active
	elsif ($text =~ s/all groups `(.+)' excluded in active/$1/) {
            foreach $group (split(/,/, $text)) {
		if ($badgroup{$group}++ == 0) {
		    $badgroup_reason{$group} = "excluded in active";
		}
	    }
	    next;
	}
    }
# Junked message
    elsif ($action eq 'j') {
	$junk{$from}++;
	if ($text =~ s/junked due to groups `(.+)'/$1/) {
            foreach $group (split(/,/, $text)) {
		if ($badgroup{$group}++ == 0) {
		    $badgroup_reason{$group} = "not in active (junked)";
		}
	    }
	    next;
	}
    }
# Ignore ihave/sendme messages
    elsif ($action eq 'i') {next;}
    elsif ($action eq 's') {next;}

# Unknown input line
    print $_;
}


if ($#id_cache >= 0) {
    do record_groups(@id_cache);
}


# Collect all sitenames and calc totals
foreach $system (keys(accepted)) {
    $systems{$system} = 1;
    $total_accepted += $accepted{$system};
}
foreach $system (keys(duplicates)) {
    $systems{$system} = 1;
    $total_duplicates += $duplicates{$system};
}
foreach $system (keys(rejected)) {
    $systems{$system} = 1;
    $total_rejected += $rejected{$system};
}
foreach $system (keys(sent)) {
    $systems{$system} = 1;
    $total_sent += $sent{$system};
}
$total_articles = $total_accepted + $total_duplicates + $total_rejected;



# Print system summaries
print "\nSystem             Accept sys% tot%    Dup sys% tot%    Rej sys%     Sent avl%\n";

foreach $system (sort keys(systems)) {
    $articles = $accepted{$system} + $duplicates{$system} + $rejected{$system};

    if ($accepted{$system} > 0) {
	$accepted_pct = ($accepted{$system} * 100) / $articles + 0.5;
	$accepted_totpct = ($accepted{$system} * 100) / $total_accepted + 0.5;
    }
    else {
	$accepted_pct = 0;
	$accepted_totpct = 0;
    }
    if ($duplicates{$system} > 0) {
	$duplicates_pct = ($duplicates{$system} * 100) / $articles + 0.5;
	$duplicates_totpct = ($duplicates{$system} * 100) / $total_duplicates + 0.5;
    }
    else {
	$duplicates_pct = 0;
	$duplicates_totpct = 0;
    }
    if ($rejected{$system} > 0) {
	$rejected_pct = ($rejected{$system} * 100) / $articles + 0.5;
    }
    else {
	$rejected_pct = 0;
    }
    if ($sent{$system} > 0) {
	$sent_pct = ($sent{$system} * 100) / $total_accepted + 0.5;
    }
    else {
	$sent_pct = 0;
    }

    printf "%-15s     %5d %3d%% %3d%%   %4d %3d%% %3d%%   %4d %3d%%    %5d %3d%%\n",
	$system,
	$accepted{$system}, $accepted_pct, $accepted_totpct,
	$duplicates{$system}, $duplicates_pct, $duplicates_totpct,
	$rejected{$system}, $rejected_pct,
	$sent{$system}, $sent_pct;
}


if ($total_accepted > 0) {
    $accepted_pct = ($total_accepted * 100) / $total_articles + 0.5;
}
else {
    $accepted_pct = 0;
}
if ($total_rejected > 0) {
    $rejected_pct = ($total_rejected * 100) / $total_articles + 0.5;
}
else {
    $rejected_pct = 0;
}
if ($total_duplicates > 0) {
    $duplicates_pct = ($total_duplicates * 100) / $total_articles + 0.5;
}
else {
    $duplicates_pct = 0;
}

printf "TOTALS              %5d %3d%%        %4d %3d%%        %4d %3d%%    %5d\n",
$total_accepted, $accepted_pct,
$total_duplicates, $duplicates_pct,
$total_rejected, $rejected_pct,
$total_sent;



# Display any bad newsgroups received
@keys = sort(keys(badgroup));
if ($#keys >= 0) {
    print "\n\nBad Newsgroups                    Articles    Reason\n";
    foreach $group (@keys) {
	printf "%-35s   %4d    %s\n",
	    $group, $badgroup{$group}, $badgroup_reason{$group};
    }
}


# Display news categories received
@keys = sort(keys(category));
if ($#keys >= 0) {
    print "\n\nCategories Received               Articles\n";
    foreach $group (@keys) {
	printf "%-35s   %4d\n",
	    $group, $category{$group};
    }
}
################################################## snip snip

Just another Perl and Cnews hacker,
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/

mark@comp.vuw.ac.nz (Mark Davies) (06/02/91)

In article <1991Jun1.193655.544@iWarp.intel.com>, merlyn@iWarp.intel.com
(Randal L. Schwartz) writes:
|> Gack.  Starting a sh that did a bunch of echoes seemed a little
|> wasted, so I did it in Perl.  (The whole thing shows a zillion
|> syscalls with trace, so there's bound to be a few more
|> optimizations.)

Thanks Randal.  That was the obvious next step.  My only excuse is that
my copy of the Perl book was on loan at that time.

Here is a small patch to your code to handle Message-IDs that contain
"`" and other sh meta chars.
With your current version you get
	"sh: syntax error at line 2: `newline or ;' unexpected"
and similar messages popping out on standard error.

cheers
mark

ps.  Aren't ambiguious date formats fun (see below entries in the
revision history).

*** 4696	Sun Jun  2 12:17:41 1991
--- nstats	Sun Jun  2 12:55:32 1991
***************
*** 71,76 ****
--- 71,79 ----
  #			that it is a shell script
  # 06/01/91      merlyn@iWarp.intel.com
  #                       replaced mark's shell script with Perl code
+ # 02/06/91      mark@comp.vuw.ac.nz
+ #                       fix randal's code to handle Message-IDs
+ #                       of the form <`=C*~5+@cck.cov.ac.uk>
  #
  ############################################################
  
***************
*** 84,90 ****
      grep(!/</ && s/.*/<$&>/, @ids);
      local($ids) = join("\n",@ids);
      open(newshist, "-|") || exec <<"PERL_EOF";
! /usr/lib/newsbin/dbz -ix /usr/lib/news/history <<SH_EOF
  $ids
  SH_EOF
  PERL_EOF
--- 87,93 ----
      grep(!/</ && s/.*/<$&>/, @ids);
      local($ids) = join("\n",@ids);
      open(newshist, "-|") || exec <<"PERL_EOF";
! /usr/lib/newsbin/dbz -ix /usr/lib/news/history <<'SH_EOF'
  $ids
  SH_EOF
  PERL_EOF

bill@unixland.natick.ma.us (Bill Heiser) (06/02/91)

In article <1991Jun1.193655.544@iWarp.intel.com> merlyn@iWarp.intel.com (Randal L. Schwartz) writes:
>
>Gack.  Starting a sh that did a bunch of echoes seemed a little
>wasted, so I did it in Perl.  (The whole thing shows a zillion
>syscalls with trace, so there's bound to be a few more optimizations.)

I tried to use this on my Esix system, but it failed since I don't have
"dbz".  To you Esix people out there --- "should I" have dbz?  

Is there anything like this (perl script for news stats) that will work
without dbz?

Thanks.
bill
-- 
bill@unixland.natick.ma.us	The Think_Tank BBS & Public Access Unix
    ...!uunet!think!unixland!bill       bill@unixland
    ..!{uunet,bloom-beacon,esegue}!world!unixland!bill
508-655-3848 (2400)   508-651-8723 (9600-HST)   508-651-8733 (9600-PEP-V32)