spike@world.std.com (Joe Ilacqua) (05/20/91)
This is a perl version of the arbitron program. I wrote parbitron because SUNOS 4.0.3's awk kept core dumping on long .newsrc lines (under SUNOS 4.1 in prints 'line too long' and aborts). I would not say this is faster, but it does work. I have shown this to Brian Reid, and felt that parbitron's output was exceptable. HOWEVER: I STRONGLY RECOMMEND that you run parbitron with "$summarypath" set to YOUR email address AND that you compare the output with that of the shell version before you set it loose. Bugs, complaints, and improvements to me. ->Spike (Spike@World.STD.com) #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh <file", e.g.. If this archive is complete, you # will see the following message at the end: # "End of shell archive." # Contents: parbitron # Wrapped by spike@world on Sun May 19 17:40:49 1991 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f parbitron -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"parbitron\" else echo shar: Extracting \"parbitron\" \(4314 characters\) sed "s/^X//" >parbitron <<'END_OF_parbitron' X#!/usr/bin/perl X# parbitron -- a perl version of the program produces rating sweeps for USENET. X X# To participate in the international monthly ratings sweeps, X# run "arbitron" every month. Brian Reid runs the statistics program on the X# first day of each month; it will include any report that has reached it by X# that time. To make sure your site's data is included, run the survey X# program no later than the 20th day of each month. X X# This version of arbitron was written by Spike (Joe Ilacqua), X# spike@world.std.com. It seemed like the right thing to do at the time. X X# Arbitron was originally written by Brian Reid, DEC Western Research Lab, X# reid@decwrl.dec.com] X X# Notes: The Perl version of arbitron intentionally does not support: X# NN's "~/.nn/rc" file, the current version of NN uses the ".newsrc". X# Old B News' 2 field active files. X# You should upgrade your software, or run the shell version of arbitron. X# X# As with the shell arbitron, the results of this program are dependent X# on the rate at which you expire news. If you are a small site that X# expires news rapidly, the results may indicate fewer active readers X# than you actually have. X X# Who to send the report to: X# uucp path: {sun, hplabs, pyramid, decvax, ucbvax}!decwrl!netsurvey X$summarypath = 'netsurvey@decwrl.dec.com'; X X# Range of /etc/passwd UID's that represent actual people (rather than X# maintenance accounts or daemons or whatever) X$lowUID = 5; X$highUID = 9999; X X# If you need to get the active file from another host define activehost. X# The user running parbitron must be able to rsh(1) to the remote host. X#$activehost = 'foo'; X X$active = '/usr/lib/news/active'; X$active = "rsh $activehost cat $active|" if ($activehost); X X$users = 0; # Users who could read news. X$newsreaders = 0; # Users who do read news. X Xchop($date = `date`); X($wday,$mon,$day,$hour,$tz,$year) = split(' ',$date); X$dat="$mon$year"; X X# One of these should return the hostname. Xchop($hostname = `hostname || uname -n || uuname -l`); X Xopen(ACTIVE,$active) || die "Can't open active file: $!\n"; X Xwhile(<ACTIVE>) X{ X next unless /^[a-z][-0-9_a-z]*\./; # from shell arbitron X ($group,$maximum,$minimum) = split; X $groupcount{$group} = 0; X $groupmax{$group} = $maximum; X $groupmin{$group} = $minimum; X} Xclose(ACTIVE); X Xwhile (($user,$pass,$uid,$gid,$quota,$com,$gcos,$dir) = getpwent) { X next if ($uid < $lowUID) || ($uid > $highUID); X $users++; X X next if $homes{$dir}; # Don't do a .newsrc twice X $homes{$dir} = 1; X next if (! -r "$dir/.newsrc"); X open(NEWSRC,"$dir/.newsrc") || next; # This shouldn't fail X X $counted = 0; X X while(<NEWSRC>) { X next if (!/: [0-9]/); X ($group,$arts) = split; X $group =~ s/://; X next unless defined($groupcount{$group}); # bogus group X next if $hits{$group}; # Don't count a group twice X $hits{$group} = 1; X X $maximum = $groupmax{$group}; X $minimum = $groupmin{$group}; X next if $minimum == $maximum; # No articles if $minimum == $maximum X X# We want the last article read from the line in the .newsrc, it is X# a comma septated number or range (i.e ...,415 or ...,3001-3078) X X @arts = split(',',$arts); # Split the line up on ","s X# Spilt the last element on "-" if need be X @arts = split('-',$arts[$#arts]) if ($arts[$#arts] =~ /-/); X if (($arts[$#arts] >= $groupmin{$group}) X && ($arts[$#arts] <= $groupmax{$group})) { X $groupcount{$group}++; X if (!$counted) { X $newsreaders++; # We have found another reader of news X $counted++; # only count them once! X } X } X } X undef %hits; X close(NEWSRC); X} X Xundef %groupmax; Xundef %groupmin; Xundef %homes; X X$i = 0; X Xwhile (($group,$count) = each %groupcount) { X $tosort[$i++] = "$count $group"; X} X Xundef %groupcount; X Xsub nr { # test like 'sort -nr' for sort function X ($anum,$astring) = split(' ',$a); X ($bnum,$bstring) = split(' ',$b); X if ($anum != $bnum) { -($anum <=> $bnum); } X else {-($astring cmp $bstring);} X} X X@sorted = sort nr @tosort; # sort most read to least X Xopen(MAIL,"|/bin/mail $summarypath"); X Xprint MAIL "Host\t\t$hostname\n"; Xprint MAIL "Users\t\t$users\n"; Xprint MAIL "NetReaders\t$newsreaders\n"; Xprint MAIL "ReportDate\t$dat\n"; Xprint MAIL "SystemType\tnews-perl-arbitron-2.4\n"; Xprint MAIL join("\n",@sorted), "\n"; # output the sorted data X Xclose(MAIL); END_OF_parbitron if test 4314 -ne `wc -c <parbitron`; then echo shar: \"parbitron\" unpacked with wrong size! fi chmod +x parbitron # end of overwriting check fi echo shar: End of shell archive. exit 0 -- The World - Public Access Unix - +1 617-739-9753 24hrs {3,12,24,96,192}00bps
rbj@uunet.uu.net (Root Boy Jim) (05/22/91)
<1991May19.215155.2205@world.std.com> spike@world.std.com (Joe Ilacqua) writes:
? This is a perl version of the arbitron program. I wrote
?parbitron because SUNOS 4.0.3's awk kept core dumping on long .newsrc
?lines (under SUNOS 4.1 in prints 'line too long' and aborts).
Yes, but what did arbitron do with nawk or gawk?
What did Brian say?
--
[rbj@uunet 1] stty sane
unknown mode: sane
syd@DSI.COM (Syd Weinstein) (05/22/91)
rbj@uunet.uu.net (Root Boy Jim) writes: >What did Brian say? I don't think he cares as long as the format is correct. I rewrote aribtron in perl, and send him and the world a copy back in the very early perl days. (The last mod date to my arbitron in perl is June 1989) -- ===================================================================== Sydney S. Weinstein, CDP, CCP Elm Coordinator Datacomp Systems, Inc. Voice: (215) 947-9900 syd@DSI.COM or dsinc!syd FAX: (215) 938-0235
spike@world.std.com (Joe Ilacqua) (05/22/91)
rbj@uunet.uu.net (Root Boy Jim) writes: <spike@world.std.com (Joe Ilacqua) writes: >? This is a perl version of the arbitron program. I wrote <?parbitron because SUNOS 4.0.3's awk kept core dumping on long .newsrc >?lines (under SUNOS 4.1 in prints 'line too long' and aborts). <Yes, but what did arbitron do with nawk or gawk? SUNOS 4.0.3 does not have nawk, tho as I recall it did work under 4.1. gawk worked, occasionally lost a associative array member, and misscounted. <What did Brian say? He said the output looked good. That's all that seemed to matter. ->Spike -- The World - Public Access Unix - +1 617-739-9753 24hrs {3,12,24,96,192}00bps
cudep@warwick.ac.uk (Ian Dickinson) (05/24/91)
In article <1991May21.190916.28112@uunet.uu.net> rbj@uunet.uu.net (Root Boy Jim) writes: >Yes, but what did arbitron do with nawk or gawk? >What did Brian say? My last arbitron run was with Brad Templeton's arbit and a /bin/sh wrapper script - I might change it to use perl sometime. It runs in about a quarter of the time and my results got taken this month. Does anyone know if the contents of the SystemType header matter? I left mine claiming to be arbitron just in case it did matter. -- \/ato /'\ /`\ Ian Dickinson TED KALDIS FOR PRESIDENT! /^^^\/^^^\ vato@warwick.ac.uk /TWIN/TEATS\ @c=GB@o=University of Warwick@ou=Computing Services@cn=Ian Dickinson / \