[comp.lang.perl] Arbitron in PERL

spike@world.std.com (Joe Ilacqua) (05/20/91)

	This is a perl version of the arbitron program.  I wrote
parbitron because SUNOS 4.0.3's awk kept core dumping on long .newsrc
lines (under SUNOS 4.1 in prints 'line too long' and aborts).  I would
not say this is faster, but it does work.  I have shown this to Brian
Reid, and felt that parbitron's output was exceptable.  HOWEVER: I
STRONGLY RECOMMEND that you run parbitron with "$summarypath" set to
YOUR email address AND that you compare the output with that of the
shell version before you set it loose.

	Bugs, complaints, and improvements to me.

->Spike (Spike@World.STD.com)

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  parbitron
# Wrapped by spike@world on Sun May 19 17:40:49 1991
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f parbitron -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"parbitron\"
else
echo shar: Extracting \"parbitron\" \(4314 characters\)
sed "s/^X//" >parbitron <<'END_OF_parbitron'
X#!/usr/bin/perl
X# parbitron -- a perl version of the program produces rating sweeps for USENET.
X
X# To participate in the international monthly ratings sweeps, 
X# run "arbitron" every month. Brian Reid runs the statistics program on the
X# first day of each month; it will include any report that has reached it by
X# that time. To make sure your site's data is included, run the survey
X# program no later than the 20th day of each month.
X
X# This version of arbitron was written by Spike (Joe Ilacqua),
X# spike@world.std.com.  It seemed like the right thing to do at the time.
X
X# Arbitron was originally written by Brian Reid, DEC Western Research Lab,
X# reid@decwrl.dec.com]
X
X# Notes: The Perl version of arbitron intentionally does not support:
X#   NN's "~/.nn/rc" file, the current version of NN uses the ".newsrc".
X#   Old B News' 2 field active files.
X#  You should upgrade your software, or run the shell version of arbitron.
X#
X#  As with the shell arbitron, the results of this program are dependent
X#  on the rate at which you expire news.  If you are a small site that
X#  expires news rapidly, the results may indicate fewer active readers
X#  than you actually have.
X
X# Who to send the report to:
X# uucp path: {sun, hplabs, pyramid, decvax, ucbvax}!decwrl!netsurvey
X$summarypath = 'netsurvey@decwrl.dec.com';
X
X# Range of /etc/passwd UID's that represent actual people (rather than
X# maintenance accounts or daemons or whatever)
X$lowUID = 5;
X$highUID = 9999;
X
X# If you need to get the active file from another host define activehost.
X# The user running parbitron must be able to rsh(1) to the remote host.
X#$activehost = 'foo';
X
X$active = '/usr/lib/news/active';
X$active = "rsh $activehost cat $active|" if ($activehost);
X
X$users = 0;			# Users who could read news.
X$newsreaders = 0;		# Users who do read news.
X
Xchop($date = `date`);
X($wday,$mon,$day,$hour,$tz,$year) = split(' ',$date);
X$dat="$mon$year";
X
X# One of these should return the hostname.
Xchop($hostname = `hostname || uname -n || uuname -l`);
X
Xopen(ACTIVE,$active) || die "Can't open active file: $!\n";
X
Xwhile(<ACTIVE>)
X{
X    next unless /^[a-z][-0-9_a-z]*\./; # from shell arbitron
X    ($group,$maximum,$minimum) = split;
X    $groupcount{$group} = 0;
X    $groupmax{$group} = $maximum;
X    $groupmin{$group} = $minimum;
X}
Xclose(ACTIVE);
X
Xwhile (($user,$pass,$uid,$gid,$quota,$com,$gcos,$dir) = getpwent) {
X    next if ($uid < $lowUID) || ($uid > $highUID);
X    $users++;
X
X    next if $homes{$dir};	# Don't do a .newsrc twice
X    $homes{$dir} = 1;
X    next if (! -r "$dir/.newsrc");
X    open(NEWSRC,"$dir/.newsrc") || next; # This shouldn't fail
X
X    $counted = 0;
X
X    while(<NEWSRC>) {
X	next if (!/: [0-9]/);
X	($group,$arts) = split;
X	$group =~ s/://;
X	next unless defined($groupcount{$group}); # bogus group
X	next if $hits{$group};	# Don't count a group twice
X	$hits{$group} = 1;
X
X	$maximum = $groupmax{$group};
X	$minimum = $groupmin{$group};
X	next if $minimum == $maximum; # No articles if $minimum == $maximum
X
X# We want the last article read from the line in the .newsrc, it is
X# a comma septated number or range (i.e ...,415 or ...,3001-3078)
X
X	@arts = split(',',$arts); # Split the line up on ","s
X# Spilt the last element on "-" if need be
X	@arts = split('-',$arts[$#arts]) if ($arts[$#arts] =~ /-/);
X	if (($arts[$#arts] >= $groupmin{$group})
X	    && ($arts[$#arts] <= $groupmax{$group})) {
X	    $groupcount{$group}++;
X	    if (!$counted) {
X		$newsreaders++;	# We have found another reader of news
X		$counted++;	# only count them once!
X	    }
X	}
X    }
X    undef %hits;
X    close(NEWSRC);
X}
X
Xundef %groupmax;
Xundef %groupmin;
Xundef %homes;
X
X$i = 0;
X
Xwhile (($group,$count) = each %groupcount) {
X    $tosort[$i++] = "$count $group";
X}
X
Xundef %groupcount;
X
Xsub nr {  # test like 'sort -nr' for sort function
X    ($anum,$astring) = split(' ',$a);
X    ($bnum,$bstring) = split(' ',$b);
X    if ($anum != $bnum) { -($anum <=> $bnum); }
X    else {-($astring cmp $bstring);}
X}
X			      
X@sorted = sort nr @tosort; # sort most read to least
X
Xopen(MAIL,"|/bin/mail $summarypath");
X
Xprint MAIL "Host\t\t$hostname\n";
Xprint MAIL "Users\t\t$users\n";
Xprint MAIL "NetReaders\t$newsreaders\n";
Xprint MAIL "ReportDate\t$dat\n";
Xprint MAIL "SystemType\tnews-perl-arbitron-2.4\n";
Xprint MAIL join("\n",@sorted), "\n"; # output the sorted data
X
Xclose(MAIL);
END_OF_parbitron
if test 4314 -ne `wc -c <parbitron`; then
    echo shar: \"parbitron\" unpacked with wrong size!
fi
chmod +x parbitron
# end of overwriting check
fi
echo shar: End of shell archive.
exit 0
-- 
The World - Public Access Unix - +1 617-739-9753  24hrs {3,12,24,96,192}00bps

rbj@uunet.uu.net (Root Boy Jim) (05/22/91)

<1991May19.215155.2205@world.std.com> spike@world.std.com (Joe Ilacqua) writes:
?	This is a perl version of the arbitron program.  I wrote
?parbitron because SUNOS 4.0.3's awk kept core dumping on long .newsrc
?lines (under SUNOS 4.1 in prints 'line too long' and aborts).

Yes, but what did arbitron do with nawk or gawk?

What did Brian say?
-- 
		[rbj@uunet 1] stty sane
		unknown mode: sane

syd@DSI.COM (Syd Weinstein) (05/22/91)

rbj@uunet.uu.net (Root Boy Jim) writes:
>What did Brian say?
I don't think he cares as long as the format is correct.

I rewrote aribtron in perl, and send him and the world a copy
back in the very early perl days.  (The last mod date to my
arbitron in perl is June 1989)
-- 
=====================================================================
Sydney S. Weinstein, CDP, CCP                   Elm Coordinator
Datacomp Systems, Inc.                          Voice: (215) 947-9900
syd@DSI.COM or dsinc!syd                        FAX:   (215) 938-0235

spike@world.std.com (Joe Ilacqua) (05/22/91)

rbj@uunet.uu.net (Root Boy Jim) writes:
<spike@world.std.com (Joe Ilacqua) writes:
>?	This is a perl version of the arbitron program.  I wrote
<?parbitron because SUNOS 4.0.3's awk kept core dumping on long .newsrc
>?lines (under SUNOS 4.1 in prints 'line too long' and aborts).
<Yes, but what did arbitron do with nawk or gawk?

	SUNOS 4.0.3 does not have nawk, tho as I recall it did work
under 4.1.  gawk worked, occasionally lost a associative array member,
and misscounted.

<What did Brian say?

	He said the output looked good.  That's all that seemed to
matter.

->Spike
-- 
The World - Public Access Unix - +1 617-739-9753  24hrs {3,12,24,96,192}00bps

cudep@warwick.ac.uk (Ian Dickinson) (05/24/91)

In article <1991May21.190916.28112@uunet.uu.net> rbj@uunet.uu.net (Root Boy Jim) writes:
>Yes, but what did arbitron do with nawk or gawk?
>What did Brian say?

My last arbitron run was with Brad Templeton's arbit and a /bin/sh wrapper
script - I might change it to use perl sometime.

It runs in about a quarter of the time and my results got taken this month.

Does anyone know if the contents of the SystemType header matter?
I left mine claiming to be arbitron just in case it did matter.
-- 
\/ato                                                               /'\  /`\
Ian Dickinson                 TED KALDIS FOR PRESIDENT!            /^^^\/^^^\
vato@warwick.ac.uk                                                /TWIN/TEATS\
@c=GB@o=University of Warwick@ou=Computing Services@cn=Ian Dickinson  /       \