pat@kla (02/11/90)
Here is a version of the usenet statistics gathering `arbitron' program written in perl. The advantages are listed in the comments at the beginning of the file. It should be run on a machine with read access to all user's home directories (for access to their .newsrc files.) I consider this a Beta version, submitted for testing against a variety configurations. Most of this program should be compatible with perl 2.18, but for full functionality, perl 3.0 is required (for socket and getpwent() support). For reference, we are running a mix of Sun hardware (Sun3, Sun4, Sun386i) and system versions (3.5, LeftCoast 4.0.3, RightCoast 4.0.1). We receive our news via uucp and maintain it under B-news 2.11 with relatively short expiration times. News is distributed within our LAN via NNTP & rrn. ------------------------- Cut Here ------------------------------------------- #! /bin/perl $VERSION = "pl-$1" if ( '$Revision: 1.6 $' =~ /([0-9]*\.[0-9]*) \$/ ) ; # # # arbitron.pl -- this program produces rating sweeps for USENET # # # Usage: arbitron.pl # or: perl arbitron.pl # # This program is intended as a replacement for the sh/sed/awk/... based # script in common use. This version offers the following advantages: # # 1) Greater portability. # (Perl runs on a lot more systems than you can find compatible # versions of Bourne shell, sed, awk, etc.) # 2) More accurate reporting if the local expiration rate is very high. # The shell script compared .newsrc contents against the active # file. This version keeps an optional arb.last file to compare # against. # 3) No temporary files. # 4) Can be run on systems which do not support multi-tasking. # 5) If your mailer allows the specification of the subject as # a command line parameter, this program will generate a # subject line of the form `arbitron data for MonYYYY'. # 6) Direct connection to NNTP socket in distributed environment. # (Perl 3.0 is required for this option.) # 7) Much easier to read. (Unless you are an awk & sed wizard :-) # # Disadvantages: # 1) This version is slower than the Bourne shell version. # (about 50% on a Sun3/60 running SunOS 3.5) # # Other differences: # 1) Obtaining count of valid users: # # The script defaults to accepting all users within the specified # range of userids; but provides optional mechanisms for counting # actual userids stored in the wtmp(5) file. # # This program provides a generalized filter capability which can # easily check the value of any of the fields from the password # file. It was felt that this is a more useful mechanism in # a networked environment. # # # To use this program, edit the "configuration" section below so that the # information is correct for your site, and then run it. It will produce a # readership survey for your machine and mail that survey to decwrl.dec.com, # with a cc to you. # # To participate in the international monthly ratings sweeps, run "arbitron" # every month. The statistics program is run on the first day of each month; # it will include any report that has reached it by that time. To make sure # your site's data is included, run the survey program no later than the 20th # day of each month. # # --------------------------------------------------------------------------- # $Log: arbitron.pl,v $ # Revision 1.6 90/02/10 18:53:02 pat # Added support for direct NNTP access of active group list. # Extended comments. # Removed several unnecessary variables. # Added debugging flags $noSave and $noMail. # # Revision 1.5 90/01/11 20:27:32 pat # Added get_user_info() subroutine to allow the use of getpwent() with # perl3.0, but retain compatibility with perl2.18. Also allowed # cleanup of code at beginning of user processing loop. # # Revision 1.4 90/01/11 19:01:40 pat # Changed subroutine good_user and the loop that calls it to expect parameters # as supplied by the perl-3.0 getpwent() function. # # Revision 1.3 90/01/11 17:26:41 pat # Various clarifications in the comments. # Moved date retrieval lines to just under initial comment cluster to obtain # better grouping of the sections which are most likely to be customized. # Changed default summary path to check for LOGNAME, then USER, to determine # the local address to mail the report to. This is to accomidate # alternate user names created purely for cron-activated jobs, etc. # Removed intermediate variable used in determining host name. # Changed generation of $destination to make it easier to remove/modify the # subject parameters. # Explicitly look for /usr/ucb/mail when building $destination, instead of # relying upon path information. # # Revision 1.2 90/01/10 09:59:15 pat # Fixed automatic extraction of $VERSION from RCS revision information. # Added RCS update log. # # --------------------------------------------------------------------------- # Debugging flags: # Set $noSave to prevent writing the .arblast file. # Set $noMail to write the report to stdout instead of mailing it. $noSave = ; $noMail = ; # --------------------------------------------------------------------------- # # Find out the date. Sweeps are reported as MonYYYY. @monthnames = ("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec") ; @tim = localtime(time) ; $dat = $monthnames[$tim[4]] . $tim[5] ; # =========================================================================== # Configuration information. Edit this section to reflect your site data. # # $ARBLAST is the file to which the `last-reported' information # is to be stored. # $ACTIVE is the path to the news system's active file. It is only # used if $NNTPserver is not set. # $SORT_OUTPUT should be set if you want the report sorted by # readership. Setting it to null will improve performance. # $NNTPserver is the hostname or internet address of the local # NNTP server. Direct NNTP connection requires a version # of perl which supports sockets. If your version does # not support sockets, or you do not use NNTP, you may # simply set $NNTPserver to ''. # $NNTPport is the port number to use to connect to your NNTP server. # The standard allocation is 119. Check your /etc/services # file or equivalent. # $Me is the mail address of the local user to whom a copy of the # report will be mailed. It may be null. # $summarypath is a list of addresses to which a copy of the report # will be mailed. It should be in the form expected by # the mailer. To have your statistics included in the # published network-wide reports, this list should include # `netservey@decwrl.dec.com' (or the uucp equivalent # {sun, hplabs, pyramid, decvax, ucbvax}!decwrl!netsurvey ) # $lowUID and $highUID delimit the range of /etc/passwd UID's which # represent actual people (rather than maintenance accounts, # daemons, or whatever.) # $ARBLAST = "/usr/lib/news/arb.last" ; $ACTIVE = '/usr/lib/news/active' ; $SORT_OUTPUT = 1 ; # Set to null if you don't want to sort $SERVERfile = '/vol/local/lib/rn/server' ; $NNTPserver = $ENV{'NNTPSERVER'} ; if ((! $NNTPserver) && (open (server, "<$SERVERfile"))) { chop ($NNTPserver = <server>) ; close (server) ; } ; if (! $NNTPserver) { $NNTPserver = 'mailhost' ; } ; $NNTPport = 119 ; $Me = ( $ENV{'LOGNAME'} || $ENV{'USER'} || 'news' ) ; $summarypath = "netsurvey@decwrl.dec.com $Me" ; $lowUID = 100 ; $highUID = 9999 ; # This subroutine is passed an array consisting of a single entry from # the password file. It should return true if that entry represents a # valid user (to check for news usage), false otherwise. # (Return value is value of last expression.) # # You may make this as complex as you deem appropriate. The default will # reject any user with the password '*'; or with a uid less than $lowUID, # or greater than $highUID; or with no home directory. # The commented out lines show how to also reject specific users by name. sub good_user { local ($name, $passwd, $uid, $gid, $quota, $comment, $gcos, $homedir, $shell) = @_ ; return ( ($passwd ne '*') && ($uid >= $lowUID) && ($uid <= $highUID) && ($homedir ne '') && ($name ne "realtime") && ($name ne "std_user") && ($name ne "eglbnch1") ) ; } # This is only used in the hostname stuff below. # We need to find the uucp name of your host. If this code doesn't work, # then just put it in literally like this: # $hostname="kla" # # `uuname -n' doesn't work on bsd systems. `hostname' may not work on # sysV or non-unix systems. For bsd, the hostname output is preferred # because `uuname -l' will truncate to 7 chars. I'm not sure why it # is not the preferred mechanism on sysV; but this is the sequence used # in the script version. if ( -d '/usr/ucb' ) { chop ($hostname = `hostname`) ; } else { chop ($hostname = `uuname -n`) ; } ; chop ($hostname = `uuname -l`) unless ($hostname) ; chop ($hostname = `hostname`) unless ($hostname) ; # The name of the base password file. This will not be referenced if # passwords can be obtained via getpwent or `ypcat passwd'. $PASSWD = '/etc/passwd' ; # --------------------------------------------------------------------------- # If your mailer can accept a subject on the command line, set $Subject # to the appropriate sequence, otherwise comment out the assignment below. # The default below works with /usr/ucb/mail on bsd derived systems. $Subject = sprintf ('-s "arbitron data for %s"', $dat) ; # If you are running on a system which does not support the creation # of a mailer subtask to which the report can be piped, replace # the $destination value with the name of a file which can be mailed # later. $Mailer = '/usr/ucb/mail' if ( -x '/usr/ucb/mail' ) ; $Mailer = 'mail' unless ($Mailer) ; $Mailer = $ENV{'MAILER'} || $Mailer ; $destination = ($noMail ? ">-" : "| $Mailer $Subject $summarypath") ; # End portion most likely to require site customization. # =========================================================================== # # Determine what newsgroups are active # if ($NNTPserver) { local ($sockaddr, $name, $aliases, $proto, $type, $len, $thisaddr, $thataddr, $this, $that) ; # If your perl does not support sockets, comment out the following # code and make sure that $NNTPserver is set to '' above. # Note: You may have to run makelib on sys/socket.h to create a # perl version. Makelib should be in your perl distribution. do 'sys/socket.h' || die "Can't do sys/socket.h: $@" ; $sockaddr = 'S n a4 x8'; chop ($hostname = `hostname`); ($name, $aliases, $proto) = getprotobyname('tcp') ; ($name, $aliases, $NNTPport) = getservbyname ($NNTPport, 'tcp') unless $NNTPport =~ /^\d+$/ ; ($name, $aliases, $type, $len, $thisaddr) = gethostbyname ($hostname) ; ($name, $aliases, $type, $len, $thataddr) = gethostbyname ($NNTPserver) ; $this = pack ($sockaddr, &AF_INET, 0, $thisaddr) ; $that = pack ($sockaddr, &AF_INET, $NNTPport, $thataddr) ; socket (active, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!" ; bind (active, $this) || die "bind: $!" ; connect (active, $that) || die "connect: $!" ; select (active) ; $| = 1 ; select (stdout) ; $_ = <active> ; die $_ unless m/^2/ ; # Extremely primitive result check. print active "LIST\r\n" ; # Request list of newsgroups. $_ = <active> ; die $_ unless m/^215/ ; # Extremely primitive result check. # End of socket-dependant stuff. } else { open (active, "$NEWS/active") || die "Could not open active file $NEWS/active" ; } ; sub get_active_line { local ($line) ; $line = <active> ; if ($NNTPserver && ($line eq ".\r\n")) { print active "QUIT\r\n" ; $line = '' ; } ; return ($line) ; } for ($nactive = 0 ; $_ = do get_active_line() ; ) { if (/(\S*)\s*([0-9]*)\s*([0-9]*)/) { $1 =~ tr/A-Z/a-z/ ; # Canonicalize to lower case $last{$1} = int ($2) ; $first{$1} = int ($3) ; $readers{$1} = 0 ; $nactive++ ; } } if ($nactive <= 0) { die "No active newsgroups" ; } if (open (arbLast, $ARBLAST)) { while ( <arbLast> ) { if ( ($groupname,$limit) = /(\S*)\s([0-9]*)/ ) { $first{$groupname} = int($limit) + 1 ; } } close (arbLast) ; } # --------------------------------------------------------------------------- # Grovel through the password file, counting users and processing # .newsrc files. $nusers = 0 ; # Total number of users $newsusers = 0 ; # Number of users who read news. # Return information about the current user. This routine is entirely to # provide getpwent() emulation for older versions of perl. # sub get_user_info { # Use this line if your perl has getpwent(); otherwise comment it out return getpwent() ; # Un-comment the following lines if your perl does not have getpwent(). # local ($name,$passwd,$uid,$gid,$gcos,$homedir,$shell) ; # # # if ( ! $PasswdIsOpen) { # open (passwd, 'ypcat passwd |') || open (passwd, $PASSWD) ; # $PasswdIsOpen = 1 ; # } # # return () if eof (passwd) ; # # ($name,$passwd,$uid,$gid,$gcos,$homedir,$shell) # = (<passwd> =~ /(.*):(.*):(.*):(.*):(.*):(.*):(.*)\n/ ) ; # # return ($name,$passwd,$uid,$gid,'','',$gcos,$homedir,$shell) ; } # Grovel through the user list, eliminating those which do not appear # to be `real users'. If a `real user' has a .newsrc file, process it. while (( @UserInfo = do get_user_info()) && ($#UserInfo > 0)) { if (do good_user (@UserInfo)) { $nusers++ ; # Increment the count of valid users. if (open (newsrc, $UserInfo[7] . "/.newsrc")) { local ($groups_read, $group_name, $last_read) ; $groups_read = 0 ; while ( <newsrc> ) { if ( /(\S*)[:!].*[,-]([0-9]+)$/ ) { ($group_name = $1) =~ tr/A-Z/a-z/ ; # Make lower case $last_read = int ($2) ; if ( ($last_read >= $first{$group_name}) && ($last_read <= $last{$group_name}) ) { $groups_read++ ; $readers{$group_name}++ ; } } } $newsusers++ if ($groups_read) ; } } } # Now build the report open (Report, "$destination") || die "Could not open mailer" ; $noSave || open (arbLast, ">$ARBLAST") || die "Could not open arb.last" ; print Report "Host\t\t$hostname\n" ; print Report "Users\t\t$nusers\n" ; print Report "NetReaders\t$newsusers\n" ; print Report "ReportDate\t$dat\n" ; print Report "SystemType\tnews-arbitron-$VERSION\n" ; if ($SORT_OUTPUT) { sub by_readership { local ($val) ; $val = $readers{$b} - $readers{$a} ; return ( ($val != 0) ? $val : (($b le $a) ? -1 : 1) ) ; } @allgroups = keys (readers) ; foreach $groupname (sort by_readership @allgroups) { print Report $readers{$groupname} . " $groupname\n" ; print arbLast "$groupname " . $last{$groupname} . "\n" unless $noSave ; } } else { while (($groupname, $Readers) = each (readers)) { print Report "$Readers $groupname\n" ; print arbLast "$groupname " . $last{$groupname} . "\n" unless $noSave ; } } exit ; Copyright (c) 1989 PM Lashley under the terms of the GNU General Public License PMLashley ...{sun | megatest | sts | zygot}!cohesive!kla!pat <<< I haven't lost my mind. It's backed up on tape somewhere... >>>